Sponsored Content
Top Forums Shell Programming and Scripting Parse and Join in a text file Post 302521908 by ctsgnb on Thursday 12th of May 2011 05:22:30 PM
Old 05-12-2011
Give a try to
Code:
nawk '{x=y;y=$0}/^ID/{if (w) print w;sub(".*"$2,">"$2);w=$0}/^P[AT]/{sub(".*"$2,$2);w=w?w" \; "$0:$0}/^\/\//{print w"\n" x;w=z}' infile

or
Code:
nawk '{x=y;y=$0}/^ID|^P[AT]/{sub(".*"$2,$2);v=$0;w=w?w" \; "v:">"v}/^\/\//{print w"\n" x;w=z}' infile

Code:
# cat tst
ID US88811111-0005
OO giensis
OS giensis
SN US74811111
PT I-008, testing for the second phase
PA sandiego group, NC
PI Carozzi; Nadine (Raleigh, NC); Hargiss; Tracy (Cary, NC); Koziel; Michael G. (Raleigh, NC); Duck; Nicholas B. (Apex, NC); Carr; Brian (Raleigh, NC);
PR 20030828 US20030498518P; 20040826 US20040926819; 20070620 US20070765494;
PE US200304985AN 20070765494
P1 Compositions and methods and seeds are provided.
QAISRLEGLSNLYVTIHEIENNTDELKFSNCVEEEIYPNNTVTCNDYTVNQEEYGGAYTSEQLINQRIEEFARNQAISRLEGLSNLYVTIHEIENNTDEL KFSNCVEEEIYPNNTVTCNDYTVNQEEYGGAYTSRNRGYNEAPSVPADYASVYEEKSYT
//
ID US74811111-0005
OO giensis
OS giensis
SN US74811111
PT I-003, a gene and methods for its use
PA NIX CORPORATION RESEARCH TRIANGLE PARK, NC
PI Carozzi; Nadine (Raleigh, NC); Hargiss; Tracy (Cary, NC); Koziel; Michael G. (Raleigh, NC); Duck; Nicholas B. (Apex, NC); Carr; Brian (Raleigh, NC);
PR 20030828 US20030498518P; 20040826 US20040926819; 20070620 US20070765494;
PE US200304985AN 20070765494
P1 Compositions and methods and seeds are provided.
MDNNPNINECIPYNCLSNPEVEVLGGERIETGYTPIDISLSLTQFLLSEFVPGAGFVLGLVDIIWGIFGPSQWDAFPVQIEQLINQRIEEFARNQAISRL EGLSNLYVTIHEIENNTDELKFSNCVEEEIYPNNTVTCNDYTVNQEEYGGAYTSRNRGYNEAPSVPADYASVYEEKSYTDGRRENPCEFNRGYRDYTPLP VGYVTKELEYFPETDKVWIEIGETEGTFIVDSVELLLMEE
//

Code:
# nawk '{x=y;y=$0}/^ID/{if (w) print w;sub(".*"$2,">"$2);w=$0}/^P[AT]/{sub(".*"$2,$2);w=w?w" \; "$0:$0}/^\/\//{print w"\n" x ;w=z}' tst
>US88811111-0005 ; I-008, testing for the second phase ; sandiego group, NC
QAISRLEGLSNLYVTIHEIENNTDELKFSNCVEEEIYPNNTVTCNDYTVNQEEYGGAYTSEQLINQRIEEFARNQAISRLEGLSNLYVTIHEIENNTDEL KFSNCVEEEIYPNNTVTCNDYTVNQEEYGGAYTSRNRGYNEAPSVPADYASVYEEKSYT
>US74811111-0005 ; I-003, a gene and methods for its use ; NIX CORPORATION RESEARCH TRIANGLE PARK, NC
MDNNPNINECIPYNCLSNPEVEVLGGERIETGYTPIDISLSLTQFLLSEFVPGAGFVLGLVDIIWGIFGPSQWDAFPVQIEQLINQRIEEFARNQAISRL EGLSNLYVTIHEIENNTDELKFSNCVEEEIYPNNTVTCNDYTVNQEEYGGAYTSRNRGYNEAPSVPADYASVYEEKSYTDGRRENPCEFNRGYRDYTPLP VGYVTKELEYFPETDKVWIEIGETEGTFIVDSVELLLMEE

Code:
# nawk '{x=y;y=$0}/^ID|^P[AT]/{sub(".*"$2,$2);v=$0;w=w?w" \; "v:">"v}/^\/\//{print w"\n"x;w=z}' tst
>US88811111-0005 ; I-008, testing for the second phase ; sandiego group, NC
QAISRLEGLSNLYVTIHEIENNTDELKFSNCVEEEIYPNNTVTCNDYTVNQEEYGGAYTSEQLINQRIEEFARNQAISRLEGLSNLYVTIHEIENNTDEL KFSNCVEEEIYPNNTVTCNDYTVNQEEYGGAYTSRNRGYNEAPSVPADYASVYEEKSYT
>US74811111-0005 ; I-003, a gene and methods for its use ; NIX CORPORATION RESEARCH TRIANGLE PARK, NC
MDNNPNINECIPYNCLSNPEVEVLGGERIETGYTPIDISLSLTQFLLSEFVPGAGFVLGLVDIIWGIFGPSQWDAFPVQIEQLINQRIEEFARNQAISRL EGLSNLYVTIHEIENNTDELKFSNCVEEEIYPNNTVTCNDYTVNQEEYGGAYTSRNRGYNEAPSVPADYASVYEEKSYTDGRRENPCEFNRGYRDYTPLP VGYVTKELEYFPETDKVWIEIGETEGTFIVDSVELLLMEE


Last edited by ctsgnb; 05-12-2011 at 06:38 PM..
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Parse Text file and send mails

Please help. I have a text file which looks something like this aaa@abc.com, c:FilePath\Eaaa.txt bbb@abc.com, c:FilePath\Ebbb.txt ccc@abc.com, c:FilePath\Eccc.txt ddd@abc.com, c:FilePath\Eddd.txt...so on I want to write a shell script which will pick up the first field 'aaa@abc.com' and... (12 Replies)
Discussion started by: Amruta Pitkar
12 Replies

2. Shell Programming and Scripting

parse text file

i am attempting to parse a simple text file with multiple lines and four fields in each line, formatted as such: 12/10/2006 12:34:06 77 38 this is what i'm having problems with in my bash script: sed '1,6d' $RAWDATA > $NEWFILE #removes first 6 lines from file, which are... (3 Replies)
Discussion started by: klick81
3 Replies

3. Shell Programming and Scripting

parse text file

I have a file that has a header followed by 8 columns of data. I want to toss out the header, and then write the data to another file with a different header and footer. I also need to grab the first values of the first and second column to put in the header. How do I chop off the header? ... (9 Replies)
Discussion started by: craggm
9 Replies

4. UNIX for Dummies Questions & Answers

parse through one text file and output many

Hi, everyone The input file pattern is like below: Begin Object1 txt1 end ; Begin Object2 txt2 end ; ... (14 Replies)
Discussion started by: sophiadun
14 Replies

5. Shell Programming and Scripting

Trying to Parse Version Information from Text File

I have a file name version.properties with the following data: major.version=14 minor.version=234 I'm trying to write a grep expression to only put "14" to stdout. The following is not working. grep "major.version=(+)" version.properties What am I doing wrong? (6 Replies)
Discussion started by: obfunkhouser
6 Replies

6. Shell Programming and Scripting

How to get awk to edit in place and join all lines in text file

Hi, I lack the utter fundamentals on how to craft an awk script. I have hundreds of text files that were mangled by .doc format so all the lines are broken up so I need to join all of the lines of text into a single line. Normally I use vim command "ggVGJ" to join all lines but with so many... (3 Replies)
Discussion started by: n00ti
3 Replies

7. Shell Programming and Scripting

How to parse a file for text b/n double quotes?

Hi guys, I desperately need some help here... I need to parse a file similar to this: I need to read the values for MY_BANNER_SSHD and WARNING_MESSAGE. The value could be empty/single line or multi-line! # Comments . . . Some lines MY_BANNER_SSHD=""... (7 Replies)
Discussion started by: shreeda
7 Replies

8. Shell Programming and Scripting

Parse text file using specific tags

awk -F "" '/<href=>|<href=>|<top>|<top>/ {print $3, OFS=\t}' source.txt > output.txt I'm not quite sure how to parse the attached file, but what I am trying to do is in a output file have the link (href=), name (after the <), and count (<top>) in 3 separate columns. My attempt is the above... (2 Replies)
Discussion started by: cmccabe
2 Replies

9. Shell Programming and Scripting

Parse file for fields and specific text

I have a file of ~500,000 entries in the following: file.txt chr1 11868 12227 ENSG00000223972.5 . + HAVANA exon . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type... (17 Replies)
Discussion started by: cmccabe
17 Replies

10. Shell Programming and Scripting

Join multiple lines from text file

Hi Guys, Could you please advise how to join multiple details lines into single row, with HEADER 1 as the record separator and comma(,) as the field separator. Input: HEADER 1, HEADER 2, HEADER 3, 11,22,33, COLUMN1,COLUMN2,COLUMN3, AA1, BB1, CC1, END: ABC HEADER 1, HEADER 2,... (3 Replies)
Discussion started by: budz26
3 Replies
All times are GMT -4. The time now is 06:38 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy