Sponsored Content
Top Forums Shell Programming and Scripting Extract specific content from data and rename its header problem asking Post 302406848 by alister on Wednesday 24th of March 2010 04:22:33 AM
Old 03-24-2010
Hi, patrick87:
Code:
awk 'FNR==NR {if (/^>/) p=substr($0,2); else a[p]=a[p] $0; next} {printf(">%s_0.%02u\n%s\n", $1, ++i[$1], substr(a[$1], $2, $3-$2+1))}' f1 f2

While processing the first file (FNR==NR), if a line begins with ">", grab everything that follows it and store it in p, the pattern name. If a line does not begin with a ">", then it is data for the current pattern, p; append the line to a[p], that pattern's entry in array a. Repeat until done with the first file.

For the second file, we use the pattern name in the first field and the index values in the second and third fields to extract the required substring from a[$1], while incrementing a counter for each pattern name seen, in the i array, i[$1].

Cheers,
Alister

Last edited by alister; 03-24-2010 at 05:31 AM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract specific content from a file

My input file: >sequence_1 ASSSSSSSSSSSDDDDDDDDDDDCCCCCCC ASDSFDFFDFDFFWERERERERFSDFESFSFD >sequence_2 ASDFDFDFFDDFFDFDSFDSFDFSDFSDFDSFASDSADSADASD ASDFFDFDFASFASFASFAFSFFSDASFASFASFAFS >sequence_3 VEDFGSDGSDGSDGSDGSDGSDGSDG dDFSDFSDFSDFSDFSDFSDFSDFSDF SDGFDGSFDGSGSDGSDGSDGSDGSDG My... (22 Replies)
Discussion started by: patrick87
22 Replies

2. Shell Programming and Scripting

Extract all the content after a specific data

My input: >seq_1 DSASSTRRARRRRTPRTPSLRSRRSDVTCS >seq_3 RMRLRRWRKSCSERS*RRSN >seq_8 RTTGLSERPRLPTTASRSISSRWTR >seq_10 NELPLEKGSLDSISIE >seq_9 PNQGDAREPQAHLPRRQGPRDRPLQAYA+ QVQHRRHDHSRTQH*LCRRRQREDCDRLHR >seq_4 DRGKGQAGCRRPQEGEALVRRCS>seq_6 FA*GLAAQDGEA*SGRG My output: Extract all... (22 Replies)
Discussion started by: patrick87
22 Replies

3. Shell Programming and Scripting

Extract specific data content from a long list of data

My input: Data name: ABC001 Data length: 1000 Detail info Data Direction Start_time End_time Length 1 forward 10 100 90 1 forward 15 200 185 2 reverse 50 500 450 Data name: XFG110 Data length: 100 Detail info Data Direction Start_time End_time Length 1 forward 50 100 50 ... (11 Replies)
Discussion started by: patrick87
11 Replies

4. Shell Programming and Scripting

Remove specific pattern header and its content problem facing

Input file: >TRACK: Position: 1 TYPE: 1 Pos: SVAVPQRHHPGGTVFREPIIIPAIPRLVPGWNKPIIIGRHAFGDQYRATDRVIPGPGKLE LVYTPVNGEPETVKVYDFQGGGIAQTQYNTDESIRGFAHASFQMALLKGLPLYMSTKNTI LKRYDGRFKDIFQEIYESTYQKDFEAKNLWYEHRLIDDMVAQMIKSEGGFVMALKNYDGD >TRACK: Position: 1 TYPE: 2 Pos: FAHASFQMALLKGLPLYMS... (8 Replies)
Discussion started by: patrick87
8 Replies

5. Shell Programming and Scripting

Way to extract detail and its content above specific value problem asking

Input file: >position_10 sample:68711 coords:5453-8666 number:3 type:complete len:344 MSINQYSSDFHYHSLMWQQQQQQQQHQNDVVEEKEALFEKPLTPSDVGKLNRLVIPKQHA ERYFPLAAAAADAVEKGLLLCFEDEEGKPWRFRYSYWNSSQSYVLTKGWSRYVKEKHLDA NRTS* >position_4 sample:68711 coords:553-866 number:4 type:partial len:483... (7 Replies)
Discussion started by: patrick87
7 Replies

6. Shell Programming and Scripting

mailx requirement - email body header in bold and data content in normal text

Dear all- I have a requirement to send an email via email with body content which looks something below- Email body contents -------------------- RequestType: Update DateAcctOpened: 1/5/2010 Note that header information and data content should be normal text.. Please advice on... (5 Replies)
Discussion started by: sureshg_sampat
5 Replies

7. Shell Programming and Scripting

Extract all content that match exactly only specific word

Input: 21 templeta parent 35718 36554 . - . ID=parent_cluster_50.21.11; Name=Partial%20parent%20for%20training%20set; 21 templeta kids 35718 36554 . - . ID=_52; Parent=parent_cluster_5085.21.11; 21 templeta ... (7 Replies)
Discussion started by: patrick87
7 Replies

8. Shell Programming and Scripting

Help with rename header content based on reference file problem

I got long list of reference file >data_tmp_number_22 >data_tmp_number_12 >data_tmp_number_20 . . Input file: >sample_data_1 Math, 5, USA, tmp SDFEWRWERWERWRWER FSFDSFSDFSDGSDGSD >sample_data_2 Math, 15, UK, tmp FDSFSDFF >sample_data_3 Math, 50, USA, tmp ARQERREQR . . Desired... (7 Replies)
Discussion started by: perl_beginner
7 Replies

9. Shell Programming and Scripting

extract specific string and rename file

Hi all, I am working on a small prog.. i have a file.txt which contains random data... K LINES V4 ADD CODE `COMPANY` ADD CODE `DISTRIBUTOR` SEQ NAME^K LINES V5 SEQ NAME^K LINES V6 ADD `PACK-LDATE` SEQ NAME^K^KCOMMAND END^KHEADINFO... (1 Reply)
Discussion started by: mukeshguliao
1 Replies

10. Shell Programming and Scripting

Help with rename data content

Input file: data21_result1 data23_result1 data43_result1 data43_result2 data43_result3 data3_result1 . . data9_result1 Desired output data1_result1 data2_result1 data3_result1 data3_result2 data3_result3 data4_result1 (3 Replies)
Discussion started by: perl_beginner
3 Replies
HXTOC(1)							  HTML-XML-utils							  HXTOC(1)

NAME
hxtoc - insert a table of contents in an HTML file SYNOPSIS
hxtoc [ -x ] [ -l low ] [ -h high ] [ file ] [ -t ] [ -d ] [ -c class ] DESCRIPTION
The hxtoc command reads an HTML file, inserts missing ID attributes in all H1 to H6 elements between the levels -l and -h (unless the option -d is in effect, see below) and also inserts A elements with NAME attributes, so old browsers will recognize the H1 to H6 headers as target anchors as well (unless the option -t is in effect). The output is written to stdout. If there is a comment of the form <!--toc--> or a pair of comments <!--begin-toc--> ... <!--end-toc--> then the comment, or the pair with everything in between, will be replaced by a table of contents, consisting of a list (UL) of links to all headers in the document. The text of headers is copied to this table of contents, including any inline markup, except that DFN tags and SPAN tags with a CLASS of "index" are omitted (but the elements content is copied). If a header has a CLASS attribute with as value (or one of its values) the keyword "no-toc", then that header will not appear in the table of contents. OPTIONS
The following options are supported: -x Use XML conventions: empty elements are written with a slash at the end: <IMG /> -l low Sets the lowest numbered header to appear in the table of content. Default is 1 (i.e., H1). -h high Sets the highest numbered header to appear in the table of content. Default is 6 (i.e., H6). -t Normally, hxtoc adds both ID attributes and empty A elements with a NAME attribute and CLASS="bctarget", so that older browsers that do no understand ID will still find the target. With this option, the A elements will not be generated. -c class The generated UL elements in the table of contents will have a CLASS attribute with the value class. The default is "toc". -d Tries to use DIV elements as targets instead of H1 to H6: If a header element H1 to H6 within the range -l to -h is found and it is the first child of a DIV element, then the table of contents will link to the DIV instead of to the header element. The DIV will be given an ID if it doesn't have one yet. ID OPERANDS
The following operand is supported: file The name of an HTML file. If absent, standard input is read instead. DIAGNOSTICS
The following exit values are returned: 0 Successful completion. > 0 An error occurred in the parsing of the HTML file. hxtoc will try to correct the error and produce output anyway. SEE ALSO
asc2xml(1), hxnormalize(1), hxnum(1), xml2asc(1) BUGS
The error recovery for incorrect HTML is primitive. 6.x 10 Jul 2011 HXTOC(1)
All times are GMT -4. The time now is 02:59 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy