Outputting characters after a given string and reporting the characters in the row below --sed

Tags
characters, output, sed, shell scripts, string

 
Thread Tools Search this Thread
# 8  
Old 1 Day Ago
Making a few wild guesses... If the output you're trying to produce is:
Code:
Codon:	AAC	Quality Score:	,ED
Codon:	TCCAAG	Quality Score:	G7DCGG
Codon:	AAC	Quality Score:	GCC
Codon:	TCCAAG	Quality Score:	DGGCGG
Codon:	AAC	Quality Score:	GCC
Codon:	TCCAAG	Quality Score:	DGGCGG
Codon:	TTT	Quality Score:	+GG

you could try something like:
Code:
awk -v lengths="3 6" -v strings="GCATGAAAACATACA TTTCCAGAAATTGT" '
BEGIN {	nString = split(strings, String)
	split(lengths, OutLen)
	for(i = 1; i <= nString; i++)
		StringLen[i] = length(String[i])
}
/^@/ {	getline CodonLine
	getline
	getline QualityLine
	for(i = 1; i <= nString; i++)
		if(spot = index(CodonLine, String[i]))
			printf("Codon:\t%s\tQuality Score:\t%s\n",
			    substr(CodonLine, spot + StringLen[i], OutLen[i]),
			    substr(QualityLine, spot + StringLen[i], OutLen[i]))
}' file

These 2 Users Gave Thanks to Don Cragun For This Post:
Neo (1 Day Ago) Xterra (1 Day Ago)
# 9  
Old 1 Day Ago
It was interesting to try to implement this algorithm on the sed
Code:
#!/bin/sed -nrf
2~4 h
4~4 {
H;x
s/(.*GCATGAAAACATACA.{3})(.*)/\1\r\2/
}
/\r/ {
:1
s/^.(.{2}[^\r].*)/\1/
T2
s/(\n).(.*)/\1\2/
t1
:2
s/^(.{3}).*/\1/mg
s/(.*)\n(.*)/Codon:\t\1\tQuality Score:\t \2/p
}


Last edited by nezabudka; 1 Day Ago at 09:08 AM..
These 2 Users Gave Thanks to nezabudka For This Post:
Neo (1 Day Ago) Xterra (1 Day Ago)
# 10  
Old 1 Day Ago
Don
I modified a bit your script to output the total count and give some format:
Code:
awk -v gene="gene-a gene-b" -v lengths="3 6" -v strings="GCATGAAAACATACA TTTCCAGAAATTGT" '
BEGIN {	nString = split(strings, String)
	split(lengths, OutLen)
	split(gene, Id)
	for(i = 1; i <= nString; i++)
		StringLen[i] = length(String[i])
}
/^@/ {	getline CodonLine
	getline
	getline QualityLine
	for(i = 1; i <= nString; i++)
		if(spot = index(CodonLine, String[i]))
			printf("Gene:\t"Id[i]"\tCodon:\t%s\t\tQuality Score:\t%s\t\n",
			    substr(CodonLine, spot + StringLen[i], OutLen[i]),
			    substr(QualityLine, spot + StringLen[i], OutLen[i]))
}' test.txt | awk '{ count[$0]++ } END {{ print "\n\t\t\t\tSummary\n#############################################################################\nCount\t\tGene\t\tCodon\t\t\tQuality Score\n" } {for (gene in count ) print count[gene] "\t" gene | "sort -k 3"}}'

With the above script I am getting the desired output:
Code:
                                Summary
#############################################################################
Count           Gene            Codon                   Quality Score

1       Gene:   gene-a  Codon:  AAC             Quality Score:  ,ED
2       Gene:   gene-a  Codon:  AAC             Quality Score:  GCC
1       Gene:   gene-a  Codon:  TTT             Quality Score:  +GG
2       Gene:   gene-b  Codon:  TCCAAG          Quality Score:  DGGCGG
1       Gene:   gene-b  Codon:  TCCAAG          Quality Score:  G7DCGG

However, I tried to include the END step in your awk script fail miserably. How can I modify the script so I don't have to "stitch" together the two scripts as shown above?
Thanks!
This User Gave Thanks to Xterra For This Post:
Neo (1 Day Ago)
# 11  
Old 22 Hours Ago
Hi Xterra,
Maybe something like:
Code:
awk -v genes="gene-a gene-b" \
    -v lengths="3 6" \
    -v strings="GCATGAAAACATACA TTTCCAGAAATTGT" '
BEGIN {	nString = split(strings, String)
	split(lengths, SLen)
	split(genes, Id)
	for(i = 1; i <= nString; i++)
		StringLen[i] = length(String[i])
	sort_cmd = "sort -k3,3 -k5,5 -k8,8"
	print "\n\t\t\t\tSummary"
	print "#############################################################" \
	    "################"
	print "Count\t\tGene\t\tCodon\t\t\tQuality Score\n"
}
/^@/ {	getline CodonLine
	getline
	getline QualityLine
	for(i = 1; i <= nString; i++)
		if(spot = index(CodonLine, String[i]))
			count[sprintf( \
			    "Gene:\t%s\tCodon:\t%s\tQuality Score:\t%s",
			    Id[i],
			    substr(CodonLine, spot + StringLen[i], SLen[i]),
			    substr(QualityLine, spot + StringLen[i], SLen[i])) \
			]++
}
END {	for(i in count)
		printf("%d\t%s\n", count[i], i) | sort_cmd
}' test.txt

which produces the output:
Code:
				Summary
#############################################################################
Count		Gene		Codon			Quality Score

1	Gene:	gene-a	Codon:	AAC	Quality Score:	,ED
2	Gene:	gene-a	Codon:	AAC	Quality Score:	GCC
1	Gene:	gene-a	Codon:	TTT	Quality Score:	+GG
2	Gene:	gene-b	Codon:	TCCAAG	Quality Score:	DGGCGG
1	Gene:	gene-b	Codon:	TCCAAG	Quality Score:	G7DCGG

I'm sure you could write this as a 1-liner, but I much prefer something I can see on a screen (and debug).

If there's anything here you can't figure out, ask questions about what you don't understand.

Hope this helps,
Don
These 2 Users Gave Thanks to Don Cragun For This Post:
Neo (22 Hours Ago) Xterra (9 Hours Ago)

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

More UNIX and Linux Forum Topics You Might Find Helpful
Reporting characters after string Xterra UNIX for Dummies Questions & Answers 10 04-12-2016 04:22 PM
sed replace nth characters with string stinkefisch Shell Programming and Scripting 4 03-22-2015 04:56 PM
Help with sed command - find a string between two characters vivek_damodaran Shell Programming and Scripting 8 10-15-2012 06:31 AM
Trouble with sed and substituting a string with special characters in variable ampsys Shell Programming and Scripting 3 06-01-2012 09:42 AM
sed cut characters of string vlm Shell Programming and Scripting 2 05-06-2012 01:36 PM
sed replacing specific characters and control characters by escaping ijustneeda Shell Programming and Scripting 11 05-03-2012 04:40 PM
Delete row if a a particular column has more then three characters in it bhargavpbk88 Shell Programming and Scripting 3 02-08-2012 03:20 PM
Replace special characters with Escape characters? laknar Shell Programming and Scripting 8 01-06-2012 12:40 AM
Want to remove the last characters from each row of csv using shell script rajak.net Shell Programming and Scripting 24 12-15-2011 05:54 AM
SED help delete characters in a string redtred Shell Programming and Scripting 7 08-31-2011 08:46 AM
remove characters from string based on occurrence of a string victor369 Shell Programming and Scripting 5 02-03-2011 09:37 PM
get certain characters in a string jimmy_y Shell Programming and Scripting 4 01-28-2010 08:44 PM
number of characters in a string rethink Shell Programming and Scripting 2 01-11-2010 03:44 PM
outputting selected characters from within a variable skinnygav UNIX for Dummies Questions & Answers 3 10-14-2009 10:51 AM
Add string after another string with special characters heliode Shell Programming and Scripting 2 03-21-2008 09:06 AM