Need help with a file that prints letters from a file according to another file!


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Need help with a file that prints letters from a file according to another file!
# 1  
Old 05-09-2009
Need help with a file that prints letters from a file according to another file!

So basically what I want to do is pull out DNA sequences for a particular gene name.

I have 2 files (FILE1 and FILE2) and I want an output into a separate file (FILE3).

FILE1 and 2 are MASSIVE so I am only posting examples from each file.

So FILE1 looks like this (tab deliminted, 4 columns):

##gff-version 1

1154 10 + AAD6
418 7429 + AAH1
702 759 + AAT1
584 10 - ABF2
642 4894 - ACC1
651 7213 - ACN9
1055 3454 - ADE1

The next file, FILE2, looks like this:


>1154
ATCTCACTCGTAATTCTACATAATTTTGTTTATGCTTTTATTGTCATTTTATATATTGTCAGTCATTATCCTATTACATTATCAATCCTTGCATTTCAGC TTCCACTTATTTCGATGACCGCTTCTCATAACTTATGTCATCTTCTAACACCGTATATGATAATGTACCAGTAGTATGAC
>584
GCAAGCTTTATAGTGACAACAATAAGGTATCACTCGGTTACAATTACCCCCACTTCCCCT


What I want to do is identify column 1 of FILE1 with the ># on FILE2. So for example, 1154 from FILE1 will match up with 1154 from FILE2. Next, I want it to identify the value on column 2 (so for 1154, it will identify the 10th letter which happens to be G). So if column 3 of FILE1 is + then it will print the first 8 letters in from of it (i.e. the 8 letters in front of G would be TCTCACTC). But if is it - on column 3, then it will take the reverse. So for ABF2 on “584” it will take the top 8 sequences starting from the reverse end. So instead of starting at “G” at >584, it will start at “T” (the end). So the position of ABF2 will be 25 letters away from “T” , so the letter will be “C”. Then it will take the values behind it... so CCACTTCC.

The output file will print out column 4 of FILE1, the top 8 letters from FILE2 and column 3 from FILE1.

The final file (FILE3) will look like this:

AAD6 TCTCACTC +
ABF2 CCACTTCC -


Could someone give me some help on this! I am new to perl and I am put in a situation where I have to program at a very high level.

Thanks
# 2  
Old 05-10-2009
I am not clear on this part -

HTML Code:
But if is it – on column 3, then it will take the reverse. So for ABF2 on “584” it will take the top 8 sequences starting from the reverse end. So instead of starting at “G” at >584, it will start at “T” (the end). So the position of ABF2 will be 25 letters away from “T” , so the letter will be “C”. Then it will take the values behind it… so CCACTTCC.
# 3  
Old 05-10-2009
Try this.

Code:
awk 'FNR==NR{a[$1]=$2SUBSEP$3","$4;next}
/^>/{gsub(/>/,"",$1);s=$1;
if (s in a){
getline;
st=substr(a[s],1,index(a[s],SUBSEP)-1)
sg=substr(a[s],index(a[s],SUBSEP)+1,1)
if ( sg == "+")str=substr($0,st-8,8);else str=substr($0,length($0)-st,8);pt=substr(a[s],index(a[s],",")+1,length(a[s]))
print pt,str,sg;next
}}' file1 file2


cheers,
Devaraj Takhellambam
# 4  
Old 05-10-2009
how MASSIVE is your file1 and file2, in terms of MB?? GB??
also, if you are new to Perl, then you should at least read up something on Perl before attempting this.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to print 1 file then when finished another file prints beside it?

I have 2 big files over 4Gbs each. I'm looking for a way to print 1 file, then when that file finish printing another file proceeds to print beside it and merge the lines together. How would to cmd or code this? from itertools import izip_longest with open("file1") as textfile1,... (14 Replies)
Discussion started by: bigvito19
14 Replies

2. Shell Programming and Scripting

Cannot find correct syntax to make file name uppercase letters

I have a file name : var=UsrAccChgRpt I want to make them upper case. Tried: $var | tr Error: tr: Invalid combination of options and Strings. Usage: tr | -ds | -s | -ds | -s ] String1 String2 tr { -d | -s | -d | -s } String1 Could you please help. I am using AIX... (2 Replies)
Discussion started by: digioleg54
2 Replies

3. UNIX for Beginners Questions & Answers

Listing a file/directory with 7 letters long

I know that I can use wild cards:ls ???????to list files 7 characters long, but how do i omit the .?! and spaces? Please use CODE tags when displaying sample input, sample output, and code segments. (2 Replies)
Discussion started by: hiya54
2 Replies

4. Shell Programming and Scripting

Script which telnets to a device, runs commands and prints output to a file

I am connecting to a device using telnet, I want my script to perform certain commands : ie- show device , show inventory..etc and write the output it sees from the terminal to a file. this is what I have got : #!/usr/bin/expect -- set running 1 spawn telnet <ip address> expect ... (1 Reply)
Discussion started by: samantha123
1 Replies

5. Shell Programming and Scripting

Grep/Awk on 1st 2 Letters in 2nd Column of File

Hi everyone. I need to change a script (ksh) so that it will grep on the 1st 2 letters in the second column of a 5 column file such as this one: 192.168.1.1 CAXY0_123 10ABFL000001 # Comment 192.168.1.2 CAYZ0_123 10ABTX000002 # Comment 192.168.2.1 FLXY0_123 11ABCA000001 ... (4 Replies)
Discussion started by: TheNovice
4 Replies

6. Shell Programming and Scripting

prints some fields from different files into a line of new file

i have 3 files as below: i want to print 1st,2nd,5th and 10th filed of 1st to 5th lines from each files into a line of an output file, so the result would be: : {line1}(field 1 of line 1 from file 1)(field 2 of line 1 from file 1)(field 5 of line 1 from file 1)(field 10 of line 1 from file... (1 Reply)
Discussion started by: saeed.soltani
1 Replies

7. Shell Programming and Scripting

changing all characters of a file to capital letters

Hi guys. I have file named output.txt containing file names. one per line. I use this command to convert all characters to capital letters and write to the same file. cat output.txt | tr 'a-z' 'A-Z' > output.txtBut at the end output.txt is emtpy. Could anyone help?? (6 Replies)
Discussion started by: majid.merkava
6 Replies

8. UNIX for Dummies Questions & Answers

Searching for three or four Uppercase Letters within a file

Looking how to find only three or four letter strings using grep in a file called hello: file contains: TIT TAT RATA ERAT RATE HI RE CA PA CHANGE SANDY ANSWER I am using the code: (4 Replies)
Discussion started by: auerbeck.tyler
4 Replies

9. Shell Programming and Scripting

How can I find the 3 first letters from the name file

Hello, I have a name file in Unix for example : ABC_TODAYFirst.001 and I want just capture or display the 3 first letters so : ABC. I tried with cut -c,1-3 and the name but it displays the 3 first letters of all lines. Can you help , Thanks a lot (8 Replies)
Discussion started by: steiner
8 Replies

10. UNIX for Advanced & Expert Users

look in file, seperate letters, put in order...

okay, I need some help! Im trying to write a script where it looks in the file you designate, pulls apart all the words so i can count how many of each letter there is in the file, then i need to put them in the order of the most occuring letter to the least. This most likley will need a loop... (3 Replies)
Discussion started by: chekeitout
3 Replies
Login or Register to Ask a Question