Sponsored Content
Top Forums Shell Programming and Scripting Parse and Join in a text file Post 302521868 by empyrean on Thursday 12th of May 2011 01:53:49 PM
Old 05-12-2011
Parse and Join in a text file

I wanted to parse a text file and join in specific format. please suggest me how to get this done..


Quote:
ID US88811111-0005
OO giensis
OS giensis
SN US74811111
PT I-008, testing for the second phase
PA sandiego group, NC
PI Carozzi; Nadine (Raleigh, NC); Hargiss; Tracy (Cary, NC); Koziel; Michael G. (Raleigh, NC); Duck; Nicholas B. (Apex, NC); Carr; Brian (Raleigh, NC);
PR 20030828 US20030498518P; 20040826 US20040926819; 20070620 US20070765494;
PE US200304985AN 20070765494
P1 Compositions and methods and seeds are provided.
QAISRLEGLSNLYVTIHEIENNTDELKFSNCVEEEIYPNNTVTCNDYTVNQEEYGGAYTSEQLINQRIEEFARNQAISRLEGLSNLYVTIHEIENNTDEL KFSNCVEEEIYPNNTVTCNDYTVNQEEYGGAYTSRNRGYNEAPSVPADYASVYEEKSYT
//
ID US74811111-0005
OO giensis
OS giensis
SN US74811111
PT I-003, a gene and methods for its use
PA NIX CORPORATION RESEARCH TRIANGLE PARK, NC
PI Carozzi; Nadine (Raleigh, NC); Hargiss; Tracy (Cary, NC); Koziel; Michael G. (Raleigh, NC); Duck; Nicholas B. (Apex, NC); Carr; Brian (Raleigh, NC);
PR 20030828 US20030498518P; 20040826 US20040926819; 20070620 US20070765494;
PE US200304985AN 20070765494
P1 Compositions and methods and seeds are provided.
MDNNPNINECIPYNCLSNPEVEVLGGERIETGYTPIDISLSLTQFLLSEFVPGAGFVLGLVDIIWGIFGPSQWDAFPVQIEQLINQRIEEFARNQAISRL EGLSNLYVTIHEIENNTDELKFSNCVEEEIYPNNTVTCNDYTVNQEEYGGAYTSRNRGYNEAPSVPADYASVYEEKSYTDGRRENPCEFNRGYRDYTPLP VGYVTKELEYFPETDKVWIEIGETEGTFIVDSVELLLMEE
//

The output should be in fasta format which consists of lines starting with ID, PT, PA and Sequence. "//" the two slashes are dividing lines between two different sequences.

Quote:
>US88811111-0005 ; I-008, testing for the second phase ; sandiego group, NC
QAISRLEGLSNLYVTIHEIENNTDELKFSNCVEEEIYPNNTVTCNDYTVNQEEYGGAYTSEQLINQRIEEFARNQAISRLEGLSNLYVTIHEIENNTDEL KFSNCVEEEIYPNNTVTCNDYTVNQEEYGGAYTSRNRGYNEAPSVPADYASVYEEKSYT

>US74811111-0005 ; I-003, a gene and methods for its use ; NIX CORPORATION RESEARCH TRIANGLE PARK, NC
MDNNPNINECIPYNCLSNPEVEVLGGERIETGYTPIDISLSLTQFLLSEFVPGAGFVLGLVDIIWGIFGPSQWDAFPVQIEQLINQRIEEFARNQAISRL EGLSNLYVTIHEIENNTDELKFSNCVEEEIYPNNTVTCNDYTVNQEEYGGAYTSRNRGYNEAPSVPADYASVYEEKSYTDGRRENPCEFNRGYRDYTPLP VGYVTKELEYFPETDKVWIEIGETEGTFIVDSVELLLMEE
Like this i have 50,000 in a single file which should be converted to fasta format
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Parse Text file and send mails

Please help. I have a text file which looks something like this aaa@abc.com, c:FilePath\Eaaa.txt bbb@abc.com, c:FilePath\Ebbb.txt ccc@abc.com, c:FilePath\Eccc.txt ddd@abc.com, c:FilePath\Eddd.txt...so on I want to write a shell script which will pick up the first field 'aaa@abc.com' and... (12 Replies)
Discussion started by: Amruta Pitkar
12 Replies

2. Shell Programming and Scripting

parse text file

i am attempting to parse a simple text file with multiple lines and four fields in each line, formatted as such: 12/10/2006 12:34:06 77 38 this is what i'm having problems with in my bash script: sed '1,6d' $RAWDATA > $NEWFILE #removes first 6 lines from file, which are... (3 Replies)
Discussion started by: klick81
3 Replies

3. Shell Programming and Scripting

parse text file

I have a file that has a header followed by 8 columns of data. I want to toss out the header, and then write the data to another file with a different header and footer. I also need to grab the first values of the first and second column to put in the header. How do I chop off the header? ... (9 Replies)
Discussion started by: craggm
9 Replies

4. UNIX for Dummies Questions & Answers

parse through one text file and output many

Hi, everyone The input file pattern is like below: Begin Object1 txt1 end ; Begin Object2 txt2 end ; ... (14 Replies)
Discussion started by: sophiadun
14 Replies

5. Shell Programming and Scripting

Trying to Parse Version Information from Text File

I have a file name version.properties with the following data: major.version=14 minor.version=234 I'm trying to write a grep expression to only put "14" to stdout. The following is not working. grep "major.version=(+)" version.properties What am I doing wrong? (6 Replies)
Discussion started by: obfunkhouser
6 Replies

6. Shell Programming and Scripting

How to get awk to edit in place and join all lines in text file

Hi, I lack the utter fundamentals on how to craft an awk script. I have hundreds of text files that were mangled by .doc format so all the lines are broken up so I need to join all of the lines of text into a single line. Normally I use vim command "ggVGJ" to join all lines but with so many... (3 Replies)
Discussion started by: n00ti
3 Replies

7. Shell Programming and Scripting

How to parse a file for text b/n double quotes?

Hi guys, I desperately need some help here... I need to parse a file similar to this: I need to read the values for MY_BANNER_SSHD and WARNING_MESSAGE. The value could be empty/single line or multi-line! # Comments . . . Some lines MY_BANNER_SSHD=""... (7 Replies)
Discussion started by: shreeda
7 Replies

8. Shell Programming and Scripting

Parse text file using specific tags

awk -F "" '/<href=>|<href=>|<top>|<top>/ {print $3, OFS=\t}' source.txt > output.txt I'm not quite sure how to parse the attached file, but what I am trying to do is in a output file have the link (href=), name (after the <), and count (<top>) in 3 separate columns. My attempt is the above... (2 Replies)
Discussion started by: cmccabe
2 Replies

9. Shell Programming and Scripting

Parse file for fields and specific text

I have a file of ~500,000 entries in the following: file.txt chr1 11868 12227 ENSG00000223972.5 . + HAVANA exon . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type... (17 Replies)
Discussion started by: cmccabe
17 Replies

10. Shell Programming and Scripting

Join multiple lines from text file

Hi Guys, Could you please advise how to join multiple details lines into single row, with HEADER 1 as the record separator and comma(,) as the field separator. Input: HEADER 1, HEADER 2, HEADER 3, 11,22,33, COLUMN1,COLUMN2,COLUMN3, AA1, BB1, CC1, END: ABC HEADER 1, HEADER 2,... (3 Replies)
Discussion started by: budz26
3 Replies
Bio::SeqIO::tab(3pm)					User Contributed Perl Documentation				      Bio::SeqIO::tab(3pm)

NAME
Bio::SeqIO::tab - nearly raw sequence file input/output stream. Reads/writes id" "sequence" " SYNOPSIS
Do not use this module directly. Use it via the Bio::SeqIO class. DESCRIPTION
This object can transform Bio::Seq objects to and from tabbed flat file databases. It is very useful when doing large scale stuff using the Unix command line utilities (grep, sort, awk, sed, split, you name it). Imagine that you have a format converter 'seqconvert' along the following lines: my $in = Bio::SeqIO->newFh(-fh => *STDIN , '-format' => $from); my $out = Bio::SeqIO->newFh(-fh=> *STDOUT, '-format' => $to); print $out $_ while <$in>; then you can very easily filter sequence files for duplicates as: $ seqconvert < foo.fa -from fasta -to tab | sort -u | seqconvert -from tab -to fasta > foo-unique.fa Or grep [-v] for certain sequences with: $ seqconvert < foo.fa -from fasta -to tab | grep -v '^S[a-z]*control' | seqconvert -from tab -to fasta > foo-without-controls.fa Or chop up a huge file with sequences into smaller chunks with: $ seqconvert < all.fa -from fasta -to tab | split -l 10 - chunk- $ for i in chunk-*; do seqconvert -from tab -to fasta < $i > $i.fa; done # (this creates files chunk-aa.fa, chunk-ab.fa, ..., each containing 10 # sequences) FEEDBACK
Mailing Lists User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://bioperl.org/wiki/Mailing_lists - About the mailing lists Support Please direct usage questions or support issues to the mailing list: bioperl-l@bioperl.org rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address it. Please include a thorough description of the problem with code and data examples if at all possible. Reporting Bugs Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via the web: https://redmine.open-bio.org/projects/bioperl/ AUTHORS
Philip Lijnzaad, p.lijnzaad@med.uu.nl APPENDIX
The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ next_seq Title : next_seq Usage : $seq = $stream->next_seq() Function: returns the next sequence in the stream Returns : Bio::Seq object Args : write_seq Title : write_seq Usage : $stream->write_seq($seq) Function: writes the $seq object into the stream Returns : 1 for success and 0 for error Args : Bio::Seq object perl v5.14.2 2012-03-02 Bio::SeqIO::tab(3pm)
All times are GMT -4. The time now is 03:37 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy