The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers
.
google unix.com



UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
replacing text in a file, but... Angelseph Shell Programming and Scripting 2 12-06-2008 12:46 AM
Replacing Text in Text file cgilchrist Shell Programming and Scripting 3 06-30-2008 11:32 PM
Parsing text from file ndnkyd Shell Programming and Scripting 0 04-02-2008 02:42 AM
Need help in parsing text file contents Alecs Shell Programming and Scripting 0 03-30-2008 01:58 PM
Text File Parsing Djlethal Shell Programming and Scripting 2 02-27-2008 03:31 AM

Reply
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 08-05-2009
mycoguy mycoguy is offline
Registered User
  
 

Join Date: Jul 2009
Posts: 4
Help parsing and replacing text with file name

Hi everyone,

I'm having trouble figuring this one out. I have ~100 *.fa files with multiple lines of fasta sequences like this: file1.fa

>xyzsequence
atcatgcacac......
ataccgagagg.....
atataccagag.....
>abcsequence
atgagatatat.....
acacacggd.....
atcgaacac....
agttccagat....

The name of each sequence is delimited by a ">" and followed by a newline. I'm trying to figure out how iterate through all of my files with a ".fa" extension and create a single tab-delimited table with the name of the sequence (tab) and the name of the file it came from. Like so:
xyzsequence file1
abcsequence file1
somsequence file2
etc...

Can anyone point me in the right direction?
Many thanks,
  #2 (permalink)  
Old 08-06-2009
zaxxon's Avatar
zaxxon zaxxon is offline Forum Staff  
Moderator
  
 

Join Date: Sep 2007
Location: Germany
Posts: 2,313
To keep the forums high quality for all users, please take the time to format your posts correctly.

First of all, use Code Tags when you post any code or data samples so others can easily read your code. You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags [code] and [/code] by hand.)

Second, avoid adding color or different fonts and font size to your posts. Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.

Third, be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.

Thank You.

The UNIX and Linux Forums

*********************************************************

A bit lengthy with a 2nd example file:

Code:
$> ll
insgesamt 20
drwxr-xr-x 2 root root  4096 2009-08-06 09:36 .
drwxr-xr-x 3 isau users 4096 2009-08-06 09:32 ..
-rw-r--r-- 1 root root   105 2009-08-06 09:20 file1
-rw-r--r-- 1 root root   103 2009-08-06 09:36 file2
$> cat file1
>xyzsequence
atcatgcacac
ataccgagagg
atataccagag
>abcsequence
atgagatatat
acacacggd
atcgaacac
agttccagat
$> cat file2
>bbbbbbbbb
atcatgcacac
ataccgagagg
atataccagag
>ccccccccccc
atgagatatat
acacacggd
atcgaacac
agttccagat
$> for FILE in file*; do tr -d "\n" < "$FILE"| awk -v file=$FILE '$0{print $0,file}' RS=">" >> outfile; done
$> cat outfile
xyzsequenceatcatgcacacataccgagaggatataccagag file1
abcsequenceatgagatatatacacacggdatcgaacacagttccagat file1
bbbbbbbbbatcatgcacacataccgagaggatataccagag file2
cccccccccccatgagatatatacacacggdatcgaacacagttccagat file2

Maybe someone can optimize it so that the tr will not be needed and inside the awk - I was currently not able to do it heh.
  #3 (permalink)  
Old 08-07-2009
mycoguy mycoguy is offline
Registered User
  
 

Join Date: Jul 2009
Posts: 4
Hi Zaxxon,

Thanks a million! I didn't want the actual sequence, just the sequence name, so I used some of your code and bits of other things that I pieced together. This is hideous and long (I know ) but it works. Next week I'll try to learn to pipe.

Code:


Code:
grep '^>' *.fa >new; sed -e 's/.fa:>/\t/g' new > new2; perl -e ' @cols=(1, 0); while(<>) { s/\r?\n//; @F=split /\t/, $_; print join("\t", @F[@cols]), "\n" } warn "\nChose columns ", join(", ", @cols), " for $. lines\n\n" ' new2 > new3; rm new; rm new2

Reply

Bookmarks

Tags
multiple files, parsing, replacing text

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 02:20 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0