Visit Our UNIX and Linux User Community


Reverse complement


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Reverse complement
# 1  
Old 07-26-2015
Reverse complement

I want to reverse some DNA sequences and complement them at the same time. Thus, A changes to T; C to G; T to A and G to C.
example:
infile
Code:
>GHL8OVD01CMQVT SHORT1
TTGATGT
>GHL8OVD01CMQVT SHORT2
TTGATGT

outfile:
Code:
>GHL8OVD01CMQVT SHORT1
ACATCAA
>GHL8OVD01CMQVT SHORT2
ACATCAA

The Identifier (> XXXXX) should not be modified
This is the code I want to modify:
Code:
awk ' !(NR%2) ' infile | rev | tr ACGT TGCA

However, the Ids are not being printed. If I include NR%2, the Ids will also be reverse complemented
I know I can always use perl:
Code:
perl -nle'BEGIN {
  @map{ A, C, G, T } = ( T, G, C, A )
  }
  print /^>/ ?
    $_ :
      join //, map $map{ $_ }, split //, scalar reverse
  ' infile

But I am trying to simplify the script so I can explain it better
# 2  
Old 07-26-2015
Can't you apply what you learned from your thread Cut & awk four days ago to this thread? You have exactly the same problem assuming that some elements of a pipeline will only process some of the lines they are fed or that lines thrown away by some element of a pipeline will still magically appear in your output.

You didn't ask any questions about the suggestions you were given there, so we assume that you understand how those suggestions work.
# 3  
Old 07-26-2015
Code:
perl -ple 'y/ACGT/TGCA/ and $_ = reverse unless /^>/' infile
>GHL8OVD01CMQVT SHORT1
ACATCAA
>GHL8OVD01CMQVT SHORT2
ACATCAA

These 2 Users Gave Thanks to Aia For This Post:
# 4  
Old 07-26-2015
sed approach (not necessarily easier to explain...):
Code:
sed '/>/n; y/ATCG/TAGC/;s/^.*$/X&X/;:x;s/\(X.\)\(.*\)\(.X\)/\3\2\1/;tx;s/X//g' file
>GHL8OVD01CMQVT SHORT1
ACATCAA
>GHL8OVD01CMQVT SHORT2
ACATCAA

or, if you have GNU sed with its extensions:
Code:
sed '/>/n; y/ATCG/TAGC/;s/^/echo /;s/$/ | rev/;e' file

This User Gave Thanks to RudiC For This Post:
# 5  
Old 07-26-2015
Quote:
Private message from Xterra
Subject: linux commands "inside" awk

Don
Maybe I did not explain myself clearly. I would like to learn if I can use regular Linux commands such as cut, tr, rev, inside the awk script.
My question about awk and cut was answered using substr -nice alternative though. And I have been working on this

Code:
awk '{ for(i=length;i!=0;i--) x=(x substr($0,i,1))}{print x;x=""}'

I love Aia's solution for this task

Code:
perl -ple 'y/ACGT/TGCA/ and $_ = reverse unless /^>/'

Which is nicer than the code I have been using this far

Code:
perl -nle'BEGIN {
@map{ A, C, G, T } = ( T, G, C, A )
}
print /^>/ ?
$_ :
join //, map $map{ $_ }, split //, scalar reverse
'

Still, I would like to know if rev | tr can be used within the awk code for this particular example so it can be applied specifically to even rather than all lines on the infile. I do not want to transform the infile into a temporary file, reverse and complement and then rebuilt the file like my example for sort

Code:
awk '{printf("%s%s",$0,(NR%2)?"\t":"\n")}' input|sort -rk 3|tr '\t' '\n'

(thanks once again for that
I have several solutions but now I am trying to learn "easier" ways to write my scripts. Maybe this is not possible and that's why I asked. I have been working on it, I just do not seem to get it to work to satisfy my requirements
Cheers!
Moderator's Comments:
Mod Comment Please do not send private messages asking technical questions. It is forbidden by the forum rules. If you have a technical question about a post in the forums, it is probably of interest to others as well. And, the volunteers working here to try to answer questions don't have time to answer questions individually when answering the question once in the technical forums can help a lot more readers.


Hi Xterra,
You can use system() to run shell commands inside awk, but invoking a shell to invoke rev and tr once for each even numbered line in your file will take at least two orders of magnitude longer to run than building equivalent functionality into your awk script. If we write an awk script to print odd numbered lines and feed even numbered lines through rev and tr:
Code:
#!/bin/ksh
IAm=${0##*/}
tmpf="$IAm.$$"
awk -v tmpf="$tmpf" '
FNR % 2
!(FNR % 2) {
	print > tmpf
	close(tmpf)
	system("rev \"" tmpf "\" | tr ACGT TGCA")
}
' ${1:-infile}
rm -f "$tmpf"

it is easy to understand and, with an input file containing 10,000 copies of your sample input file, the average of timing 10 runs (with output redirected to a file) is about:
Code:
real	1m5.37s
user	0m41.09s
sys	0m49.33s

A similar awk script building the rev and tr functionality into an internal function:
Code:
#!/bin/ksh
awk '
BEGIN {	c["A"] = "T"; c["C"] = "G"; c["G"] = "C"; c["T"] = "A" }
function revcomp(	i, o) {
	o = ""
	for(i = length; i > 0; i--)
		o = o c[substr($0, i, 1)]
	return(o)
}
!(FNR % 2) {$0 = revcomp()}
1' ${1:-infile}

produces exactly the same output and takes about:
Code:
real	0m0.16s
user	0m0.15s
sys	0m0.00s

In other words this awk script processes a little more than 800 lines in the time it take to process 2 lines firing up a pipeline to process the even lines.

The average timing for Aia's perl suggestion was:
Code:
real	0m0.03s
user	0m0.02s
sys	0m0.01s

For some reason the BSD based sed on OS X produced the wrong output (with leading and trailing X characters on even numbered lines; the lines had been translated but not reversed) without producing any diagnostics when running RudiC's sed script. But an equivalent command (splitting on semicolons into separate sed editing commands):
Code:
sed -e '/>/n' -e 'y/ATCG/TAGC/' -e 's/^.*$/X&X/' -e ':x' -e 's/\(X.\)\(.*\)\(.X\)/\3\2\1/' -e 'tx' -e's/X//g' infile

produced the expected output with average timing output of:
Code:
real	0m0.09s
user	0m0.09s
sys	0m0.00s


Last edited by Don Cragun; 07-26-2015 at 09:36 PM.. Reason: Properly attribute RudiC for his sed script.
These 2 Users Gave Thanks to Don Cragun For This Post:
 

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Reverse of a string

Hi All, I have a String str="Manish". I would like to reverse it. I know the option to do this in bash is: echo "Manish" | rev but I have seen an alternate solution somewhere, which states that: str="Manish" echo $str | awk '{ for(i=length($0);i>=1;i--) printf("%s",substr($0,i,1));... (7 Replies)
Discussion started by: manishdivs
7 Replies

2. Shell Programming and Scripting

reverse matching

Hello guys How can I use egrep to match word1 but not word2...word1. What I mean suppose that I have the following text, and my word1=pizza and word2=eat I hate to eat pizza because I ma eating it each day Pizza is good I like vegetarian and Italian Pizza eating healthy food is... (7 Replies)
Discussion started by: fdc2suxs
7 Replies

3. UNIX for Dummies Questions & Answers

Two complement to double conversion

Is there a reuse code somewhere for conversion of two complement to a 64-bit double? Any pointer is greatly appreciated! GG (0 Replies)
Discussion started by: NAVTime
0 Replies

4. Shell Programming and Scripting

how to reverse file

i am using AIX -ksh how can i reverse any file ,i have already try tac cmd it is not in AIX: please help me out. (3 Replies)
Discussion started by: RahulJoshi
3 Replies

5. Shell Programming and Scripting

reverse an integer

i have created a script that will reverse any given ineter. #!/bin/ksh echo "Enter the number" read n if then a=`expr $n / 10` b=`expr $n % 10` c=`expr $b \* 10 + $a` fi echo $c --------------------------------------------------------------------- the problem with this script... (4 Replies)
Discussion started by: ali560045
4 Replies

6. Shell Programming and Scripting

reverse sort

Hello, How do i sort a csv file. i should be sorting column1(varchar),column2*(varchar) in ascending and column4 in descending order(numeric datatype). I tried few combinations of sort, but doesn't seem to be getting the right result. sort -t "," -k 1 -k 2 -k 4nr file any help is... (3 Replies)
Discussion started by: markjason
3 Replies

7. Shell Programming and Scripting

Reverse FTP

Hi Everybody, I want to write a script in unix which will automatically FTP a .txt file from my client machine D: drive(Windows) That is I want to FTP a file from my PC to UNIX box but this should be done from UNIX box by a shell script. (i.e. I will invoke the script in UNIX and FTP will be... (4 Replies)
Discussion started by: ganesh123
4 Replies

8. Shell Programming and Scripting

reverse of basename

Hi, Can someone let me know how to find the reverse of the basename i.e i have /apps/tiv/pmon/xxxx.dat and I want /apps/tiv/pmon/ Thanks (7 Replies)
Discussion started by: braindrain
7 Replies

9. Shell Programming and Scripting

Reverse *

when I do $ ls z* List of all files begining with 'z'. But what if I want to do a reverse lookup. Just for interest sake ;) $ ls ztr should be same as $ ls ztr* $ ls zt* $ ls z* (2 Replies)
Discussion started by: azmathshaikh
2 Replies

Featured Tech Videos