Parse and Join in a text file Post: 302521868

Sponsored Content

Top Forums Shell Programming and Scripting Parse and Join in a text file Post 302521868 by empyrean on Thursday 12th of May 2011 01:53:49 PM

05-12-2011

Registered User

Parse and Join in a text file

I wanted to parse a text file and join in specific format. please suggest me how to get this done..

Quote:

ID US88811111-0005
OO giensis
OS giensis
SN US74811111
PT I-008, testing for the second phase
PA sandiego group, NC
PI Carozzi; Nadine (Raleigh, NC); Hargiss; Tracy (Cary, NC); Koziel; Michael G. (Raleigh, NC); Duck; Nicholas B. (Apex, NC); Carr; Brian (Raleigh, NC);
PR 20030828 US20030498518P; 20040826 US20040926819; 20070620 US20070765494;
PE US200304985AN 20070765494
P1 Compositions and methods and seeds are provided.
QAISRLEGLSNLYVTIHEIENNTDELKFSNCVEEEIYPNNTVTCNDYTVNQEEYGGAYTSEQLINQRIEEFARNQAISRLEGLSNLYVTIHEIENNTDEL KFSNCVEEEIYPNNTVTCNDYTVNQEEYGGAYTSRNRGYNEAPSVPADYASVYEEKSYT
//
ID US74811111-0005
OO giensis
OS giensis
SN US74811111
PT I-003, a gene and methods for its use
PA NIX CORPORATION RESEARCH TRIANGLE PARK, NC
PI Carozzi; Nadine (Raleigh, NC); Hargiss; Tracy (Cary, NC); Koziel; Michael G. (Raleigh, NC); Duck; Nicholas B. (Apex, NC); Carr; Brian (Raleigh, NC);
PR 20030828 US20030498518P; 20040826 US20040926819; 20070620 US20070765494;
PE US200304985AN 20070765494
P1 Compositions and methods and seeds are provided.
MDNNPNINECIPYNCLSNPEVEVLGGERIETGYTPIDISLSLTQFLLSEFVPGAGFVLGLVDIIWGIFGPSQWDAFPVQIEQLINQRIEEFARNQAISRL EGLSNLYVTIHEIENNTDELKFSNCVEEEIYPNNTVTCNDYTVNQEEYGGAYTSRNRGYNEAPSVPADYASVYEEKSYTDGRRENPCEFNRGYRDYTPLP VGYVTKELEYFPETDKVWIEIGETEGTFIVDSVELLLMEE
//

The output should be in fasta format which consists of lines starting with ID, PT, PA and Sequence. "//" the two slashes are dividing lines between two different sequences.

Quote:

>US88811111-0005 ; I-008, testing for the second phase ; sandiego group, NC
QAISRLEGLSNLYVTIHEIENNTDELKFSNCVEEEIYPNNTVTCNDYTVNQEEYGGAYTSEQLINQRIEEFARNQAISRLEGLSNLYVTIHEIENNTDEL KFSNCVEEEIYPNNTVTCNDYTVNQEEYGGAYTSRNRGYNEAPSVPADYASVYEEKSYT

>US74811111-0005 ; I-003, a gene and methods for its use ; NIX CORPORATION RESEARCH TRIANGLE PARK, NC
MDNNPNINECIPYNCLSNPEVEVLGGERIETGYTPIDISLSLTQFLLSEFVPGAGFVLGLVDIIWGIFGPSQWDAFPVQIEQLINQRIEEFARNQAISRL EGLSNLYVTIHEIENNTDELKFSNCVEEEIYPNNTVTCNDYTVNQEEYGGAYTSRNRGYNEAPSVPADYASVYEEKSYTDGRRENPCEFNRGYRDYTPLP VGYVTKELEYFPETDKVWIEIGETEGTFIVDSVELLLMEE

Like this i have 50,000 in a single file which should be converted to fasta format

empyrean

View Public Profile for empyrean

Find all posts by empyrean

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Parse Text file and send mails

Please help. I have a text file which looks something like this aaa@abc.com, c:FilePath\Eaaa.txt bbb@abc.com, c:FilePath\Ebbb.txt ccc@abc.com, c:FilePath\Eccc.txt ddd@abc.com, c:FilePath\Eddd.txt...so on I want to write a shell script which will pick up the first field 'aaa@abc.com' and...

2. Shell Programming and Scripting

parse text file

i am attempting to parse a simple text file with multiple lines and four fields in each line, formatted as such: 12/10/2006 12:34:06 77 38 this is what i'm having problems with in my bash script: sed '1,6d' $RAWDATA > $NEWFILE #removes first 6 lines from file, which are...

3. Shell Programming and Scripting

parse text file

I have a file that has a header followed by 8 columns of data. I want to toss out the header, and then write the data to another file with a different header and footer. I also need to grab the first values of the first and second column to put in the header. How do I chop off the header? ...

4. UNIX for Dummies Questions & Answers

parse through one text file and output many

Hi, everyone The input file pattern is like below: Begin Object1 txt1 end ; Begin Object2 txt2 end ; ...

5. Shell Programming and Scripting

Trying to Parse Version Information from Text File

I have a file name version.properties with the following data: major.version=14 minor.version=234 I'm trying to write a grep expression to only put "14" to stdout. The following is not working. grep "major.version=(+)" version.properties What am I doing wrong?

6. Shell Programming and Scripting

How to get awk to edit in place and join all lines in text file

Hi, I lack the utter fundamentals on how to craft an awk script. I have hundreds of text files that were mangled by .doc format so all the lines are broken up so I need to join all of the lines of text into a single line. Normally I use vim command "ggVGJ" to join all lines but with so many...

7. Shell Programming and Scripting

How to parse a file for text b/n double quotes?

Hi guys, I desperately need some help here... I need to parse a file similar to this: I need to read the values for MY_BANNER_SSHD and WARNING_MESSAGE. The value could be empty/single line or multi-line! # Comments . . . Some lines MY_BANNER_SSHD=""...

8. Shell Programming and Scripting

Parse text file using specific tags

awk -F "" '/<href=>|<href=>|<top>|<top>/ {print $3, OFS=\t}' source.txt > output.txt I'm not quite sure how to parse the attached file, but what I am trying to do is in a output file have the link (href=), name (after the <), and count (<top>) in 3 separate columns. My attempt is the above...

9. Shell Programming and Scripting

Parse file for fields and specific text

I have a file of ~500,000 entries in the following: file.txt chr1 11868 12227 ENSG00000223972.5 . + HAVANA exon . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type...

10. Shell Programming and Scripting

Join multiple lines from text file

Hi Guys, Could you please advise how to join multiple details lines into single row, with HEADER 1 as the record separator and comma(,) as the field separator. Input: HEADER 1, HEADER 2, HEADER 3, 11,22,33, COLUMN1,COLUMN2,COLUMN3, AA1, BB1, CC1, END: ABC HEADER 1, HEADER 2,...

LEARN ABOUT DEBIAN

bio::seqio::tab

Bio::SeqIO::tab(3pm)					User Contributed Perl Documentation				      Bio::SeqIO::tab(3pm)

NAME

       Bio::SeqIO::tab - nearly raw sequence file input/output stream. Reads/writes id"	"sequence"
"

SYNOPSIS

       Do not use this module directly.  Use it via the Bio::SeqIO class.

DESCRIPTION

       This object can transform Bio::Seq objects to and from tabbed flat file databases.

       It is very useful when doing large scale stuff using the Unix command line utilities (grep, sort, awk, sed, split, you name it). Imagine
       that you have a format converter 'seqconvert' along the following lines:

	 my $in  = Bio::SeqIO->newFh(-fh => *STDIN , '-format' => $from);
	 my $out = Bio::SeqIO->newFh(-fh=> *STDOUT, '-format' => $to);
	 print $out $_ while <$in>;

       then you can very easily filter sequence files for duplicates as:

	 $ seqconvert < foo.fa -from fasta -to tab | sort -u |
	      seqconvert -from tab -to fasta > foo-unique.fa

       Or grep [-v] for certain sequences with:

	 $ seqconvert < foo.fa -from fasta -to tab | grep -v '^S[a-z]*control' |
	      seqconvert -from tab -to fasta > foo-without-controls.fa

       Or chop up a huge file with sequences into smaller chunks with:

	 $ seqconvert < all.fa -from fasta -to tab | split -l 10 - chunk-
	 $ for i in chunk-*; do seqconvert -from tab -to fasta < $i > $i.fa; done
	 # (this creates files chunk-aa.fa, chunk-ab.fa, ..., each containing 10
	 # sequences)

FEEDBACK

   Mailing Lists
       User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one
       of the Bioperl mailing lists.  Your participation is much appreciated.

	 bioperl-l@bioperl.org			- General discussion
	 http://bioperl.org/wiki/Mailing_lists	- About the mailing lists

   Support
       Please direct usage questions or support issues to the mailing list:

       bioperl-l@bioperl.org

       rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address
       it. Please include a thorough description of the problem with code and data examples if at all possible.

   Reporting Bugs
       Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution.  Bug reports can be submitted via the
       web:

	 https://redmine.open-bio.org/projects/bioperl/

AUTHORS

       Philip Lijnzaad, p.lijnzaad@med.uu.nl

APPENDIX

       The rest of the documentation details each of the object methods.  Internal methods are usually preceded with a _

   next_seq
	Title	: next_seq
	Usage	: $seq = $stream->next_seq()
	Function: returns the next sequence in the stream
	Returns : Bio::Seq object
	Args	:

   write_seq
	Title	: write_seq
	Usage	: $stream->write_seq($seq)
	Function: writes the $seq object into the stream
	Returns : 1 for success and 0 for error
	Args	: Bio::Seq object

perl v5.14.2							    2012-03-02						      Bio::SeqIO::tab(3pm)

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Parse Text file and send mails

Discussion started by: Amruta Pitkar

2. Shell Programming and Scripting

parse text file

Discussion started by: klick81

3. Shell Programming and Scripting

parse text file

Discussion started by: craggm

4. UNIX for Dummies Questions & Answers

parse through one text file and output many

Discussion started by: sophiadun

5. Shell Programming and Scripting

Trying to Parse Version Information from Text File

Discussion started by: obfunkhouser

6. Shell Programming and Scripting

How to get awk to edit in place and join all lines in text file

Discussion started by: n00ti

7. Shell Programming and Scripting

How to parse a file for text b/n double quotes?

Discussion started by: shreeda

8. Shell Programming and Scripting

Parse text file using specific tags

Discussion started by: cmccabe

9. Shell Programming and Scripting

Parse file for fields and specific text

Discussion started by: cmccabe

10. Shell Programming and Scripting

Join multiple lines from text file

Discussion started by: budz26

LEARN ABOUT DEBIAN

bio::seqio::tab