Help parsing and replacing text with file name


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Help parsing and replacing text with file name
# 1  
Old 08-05-2009
Help parsing and replacing text with file name

Hi everyone,

I'm having trouble figuring this one out. I have ~100 *.fa files with multiple lines of fasta sequences like this: file1.fa

>xyzsequence
atcatgcacac......
ataccgagagg.....
atataccagag.....
>abcsequence
atgagatatat.....
acacacggd.....
atcgaacac....
agttccagat....

The name of each sequence is delimited by a ">" and followed by a newline. I'm trying to figure out how iterate through all of my files with a ".fa" extension and create a single tab-delimited table with the name of the sequence (tab) and the name of the file it came from. Like so:
xyzsequence file1
abcsequence file1
somsequence file2
etc...

Can anyone point me in the right direction?
Many thanks,
# 2  
Old 08-06-2009
To keep the forums high quality for all users, please take the time to format your posts correctly.

First of all, use Code Tags when you post any code or data samples so others can easily read your code. You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags [code] and [/code] by hand.)

Second, avoid adding color or different fonts and font size to your posts. Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.

Third, be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.

Thank You.

The UNIX and Linux Forums

*********************************************************

A bit lengthy with a 2nd example file:
Code:
$> ll
insgesamt 20
drwxr-xr-x 2 root root  4096 2009-08-06 09:36 .
drwxr-xr-x 3 isau users 4096 2009-08-06 09:32 ..
-rw-r--r-- 1 root root   105 2009-08-06 09:20 file1
-rw-r--r-- 1 root root   103 2009-08-06 09:36 file2
$> cat file1
>xyzsequence
atcatgcacac
ataccgagagg
atataccagag
>abcsequence
atgagatatat
acacacggd
atcgaacac
agttccagat
$> cat file2
>bbbbbbbbb
atcatgcacac
ataccgagagg
atataccagag
>ccccccccccc
atgagatatat
acacacggd
atcgaacac
agttccagat
$> for FILE in file*; do tr -d "\n" < "$FILE"| awk -v file=$FILE '$0{print $0,file}' RS=">" >> outfile; done
$> cat outfile
xyzsequenceatcatgcacacataccgagaggatataccagag file1
abcsequenceatgagatatatacacacggdatcgaacacagttccagat file1
bbbbbbbbbatcatgcacacataccgagaggatataccagag file2
cccccccccccatgagatatatacacacggdatcgaacacagttccagat file2

Maybe someone can optimize it so that the tr will not be needed and inside the awk - I was currently not able to do it heh.
# 3  
Old 08-07-2009
Hi Zaxxon,

Thanks a million! I didn't want the actual sequence, just the sequence name, so I used some of your code and bits of other things that I pieced together. This is hideous and long (I know Smilie) but it works. Next week I'll try to learn to pipe.

Code:

Code:
grep '^>' *.fa >new; sed -e 's/.fa:>/\t/g' new > new2; perl -e ' @cols=(1, 0); while(<>) { s/\r?\n//; @F=split /\t/, $_; print join("\t", @F[@cols]), "\n" } warn "\nChose columns ", join(", ", @cols), " for $. lines\n\n" ' new2 > new3; rm new; rm new2

 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Parsing text file

Hi Friends, I am back for the second round today - :D My input text file is this way Home friends friendship meter Tools Mirrors Downloads My Data About Us Help My own results BLAT Search Results ACTIONS QUERY SCORE START END QSIZE IDENTITY CHRO STRAND ... (7 Replies)
Discussion started by: jacobs.smith
7 Replies

2. Shell Programming and Scripting

Parsing text file

I'm totally stumped with how to handle this huge text file I'm trying to deal with. I really need some help! Here is what is looks like: ab1ba67c331a3d731396322fad8dd71a3b627f89359827697645c806091c40b9 0.2 812a3c3684310045f1cb3157bf5eebc4379804e98c82b56f3944564e7bf5dab5 0.6 0.6... (3 Replies)
Discussion started by: comp8765
3 Replies

3. Programming

Parsing a Text file using C++

I was trying to parse the text file, which will looks like this ###XYZABC#### ############ int = 4 char = 1 float = 1 . . ############ like this my text file will contains lots of entries and I need to store these entries in the map eg. map.first = int and map.second = 4 same way I... (5 Replies)
Discussion started by: agupta2
5 Replies

4. Shell Programming and Scripting

Need help parsing a text file

I have a text file: router1#sh ip blah blah | incl --- Gi2/8 10.60.4.181 --- 10.60.123.175 11 0000 0000 355K Gi2/8 10.60.83.28 --- 224.10.10.26 11 F9FF 3840 154K Gi2/8 10.60.83.198 --- ... (1 Reply)
Discussion started by: streetfighter2
1 Replies

5. Shell Programming and Scripting

replacing text with contents from another file

I'm trying to change the ramfs size in kernel .config automatically. I have a ramfs_size file generated with du -s cat ramfs_size 64512 I want to replace the linux .config's ramdisk size with the above value CONFIG_BLK_DEV_RAM_SIZE=73728 Right now I'm doing something dumb like: ... (3 Replies)
Discussion started by: amoeba
3 Replies

6. Shell Programming and Scripting

replacing text in a file, but...

Hi all, Very first post on this forums, hope you can help me with this scripting task. I have a big text file with over 3000 lines, some of those lines contain some text that I need to replace, lets say for simplicity the text to be replaced in those lines is "aaa" and I need it to replace it... (2 Replies)
Discussion started by: Angelseph
2 Replies

7. Shell Programming and Scripting

Replacing Text in Text file

Hi Guys, I am needing some help writing a shell script to replace the following in a text file /opt/was/apps/was61 with some other path eg /usr/blan/blah/blah. I know that i can do it using sed or perl but just having difficulty writing the escape characters for it All Help... (3 Replies)
Discussion started by: cgilchrist
3 Replies

8. Shell Programming and Scripting

Parsing text from file

Any ideas? 1)loop through text file 2)extract everything between SOL and EOL 3)output files, for example: 123.txt and 124.txt for the file below So far I have: sed -n "/SOL/,/EOL/{p;/EOL/q;}" file Here is an example of my text file. SOL-123.go something goes here something goes... (0 Replies)
Discussion started by: ndnkyd
0 Replies

9. Shell Programming and Scripting

replacing strings with text from other file

Hi, Im trying to update some properties files with text from another file: file1 user=xyz file2 user= after script file2 user=xyz Im using this reading the $QUARTZURL,ETC... from quartz.properties: echo... (1 Reply)
Discussion started by: mc1392
1 Replies

10. Shell Programming and Scripting

Text File Parsing

Hey Guys.I am a newbie on Bash Shell Scripting and Perl.And I have a question about file parsing. I have a log file which contains reports about a communication device.I need to take some of the reports from the log file.Its hard to explain the issue.but shortly I can say that, the reports has a... (2 Replies)
Discussion started by: Djlethal
2 Replies
Login or Register to Ask a Question