![]() |
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here. |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| file size comparision local file and remote file | dba.admin2008 | Shell Programming and Scripting | 4 | 11-13-2008 05:57 PM |
| Reading a file and writing the file name to a param file. | thebeginer | UNIX for Advanced & Expert Users | 1 | 10-05-2007 04:38 PM |
| Reading file names from a file and executing the relative file from shell script | anushilrai | Shell Programming and Scripting | 4 | 03-10-2006 05:25 AM |
| How can I find the 3 first letters from the name file | steiner | Shell Programming and Scripting | 8 | 06-17-2005 08:10 AM |
| look in file, seperate letters, put in order... | chekeitout | UNIX for Advanced & Expert Users | 3 | 11-05-2004 05:00 PM |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
||||
|
Need help with a file that prints letters from a file according to another file!
So basically what I want to do is pull out DNA sequences for a particular gene name.
I have 2 files (FILE1 and FILE2) and I want an output into a separate file (FILE3). FILE1 and 2 are MASSIVE so I am only posting examples from each file. So FILE1 looks like this (tab deliminted, 4 columns): ##gff-version 1 1154 10 + AAD6 418 7429 + AAH1 702 759 + AAT1 584 10 - ABF2 642 4894 - ACC1 651 7213 - ACN9 1055 3454 - ADE1 The next file, FILE2, looks like this: >1154 ATCTCACTCGTAATTCTACATAATTTTGTTTATGCTTTTATTGTCATTTTATATATTGTCAGTCATTATCCTATTACATTATCAATCCTTGCATTTCAGC TTCCACTTATTTCGATGACCGCTTCTCATAACTTATGTCATCTTCTAACACCGTATATGATAATGTACCAGTAGTATGAC >584 GCAAGCTTTATAGTGACAACAATAAGGTATCACTCGGTTACAATTACCCCCACTTCCCCT What I want to do is identify column 1 of FILE1 with the ># on FILE2. So for example, 1154 from FILE1 will match up with 1154 from FILE2. Next, I want it to identify the value on column 2 (so for 1154, it will identify the 10th letter which happens to be G). So if column 3 of FILE1 is + then it will print the first 8 letters in from of it (i.e. the 8 letters in front of G would be TCTCACTC). But if is it on column 3, then it will take the reverse. So for ABF2 on 584 it will take the top 8 sequences starting from the reverse end. So instead of starting at G at >584, it will start at T (the end). So the position of ABF2 will be 25 letters away from T , so the letter will be C. Then it will take the values behind it so CCACTTCC. The output file will print out column 4 of FILE1, the top 8 letters from FILE2 and column 3 from FILE1. The final file (FILE3) will look like this: AAD6 TCTCACTC + ABF2 CCACTTCC - Could someone give me some help on this! I am new to perl and I am put in a situation where I have to program at a very high level. Thanks |
|
||||
|
I am not clear on this part -
HTML Code:
But if is it – on column 3, then it will take the reverse. So for ABF2 on “584” it will take the top 8 sequences starting from the reverse end. So instead of starting at “G” at >584, it will start at “T” (the end). So the position of ABF2 will be 25 letters away from “T” , so the letter will be “C”. Then it will take the values behind it… so CCACTTCC. |
|
||||
|
Try this.
Code:
awk 'FNR==NR{a[$1]=$2SUBSEP$3","$4;next}
/^>/{gsub(/>/,"",$1);s=$1;
if (s in a){
getline;
st=substr(a[s],1,index(a[s],SUBSEP)-1)
sg=substr(a[s],index(a[s],SUBSEP)+1,1)
if ( sg == "+")str=substr($0,st-8,8);else str=substr($0,length($0)-st,8);pt=substr(a[s],index(a[s],",")+1,length(a[s]))
print pt,str,sg;next
}}' file1 file2
cheers, Devaraj Takhellambam |
![]() |
| Bookmarks |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|