Way to extract detail and its content above specific value problem asking


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Way to extract detail and its content above specific value problem asking
# 1  
Old 03-16-2010
Way to extract detail and its content above specific value problem asking

Input file:
Code:
>position_10 sample:68711 coords:5453-8666 number:3 type:complete len:344
MSINQYSSDFHYHSLMWQQQQQQQQHQNDVVEEKEALFEKPLTPSDVGKLNRLVIPKQHA
ERYFPLAAAAADAVEKGLLLCFEDEEGKPWRFRYSYWNSSQSYVLTKGWSRYVKEKHLDA
NRTS*
>position_4 sample:68711 coords:553-866 number:4 type:partial len:483
MSGVVRSSPGSSQPPPPPPHHPPSSPVPVTSTPVIPPIRRHLAFASTKPPFHPSDDYHRF
KITPSDVENDESDYWLLSNAEISMTDIWKTDSGIDWDYGIADVSTPPPGMGEIAPTAVDS
TPR*
>position_7 sample:68711 coords:453-86 number:2 type:partial len:214
KAAETLEVQKRRIYDITNVLEGIDLIEKPFKNRILWKGVDACPGDEDADVSVLQLQAEIE
NLALEEQALDNQIRWLFVTEEDIKSLPGFQNQTLIAVKAPHGTTLEVPDPDEAADHPQRR
TDSGIDWDYGIADVSTPPPGMGEIAPTAVDSTPR*
>position_11 sample:68711 coords:53-86 number:1 type:complete len:558
MLGDFIIRLLVLILGYTYPAFECFKTVEKNKVDIEELRFWCQYWILLALISSFERVGDFF
RAPRPLNKSLSALRSLEKQTSRGRKWPPPTPPPTPGRDSAGTFNGDDGVNIPDTIPGSPL
TDARAKLRRSNSRTQPAA*
.
.

Output file:
Code:
>position_10 sample:68711 coords:5453-8666 number:3 type:complete len:344
MSINQYSSDFHYHSLMWQQQQQQQQHQNDVVEEKEALFEKPLTPSDVGKLNRLVIPKQHA
ERYFPLAAAAADAVEKGLLLCFEDEEGKPWRFRYSYWNSSQSYVLTKGWSRYVKEKHLDA
NRTS*
>position_11 sample:68711 coords:53-86 number:1 type:complete len:558
MLGDFIIRLLVLILGYTYPAFECFKTVEKNKVDIEELRFWCQYWILLALISSFERVGDFF
RAPRPLNKSLSALRSLEKQTSRGRKWPPPTPPPTPGRDSAGTFNGDDGVNIPDTIPGSPL
TDARAKLRRSNSRTQPAA*
.
.

I would like to extract the content and detail match with below criteria:
1. header must got the "complete" word (eg. type:complete )
2. lens must above or equal to 300 (eg. len:344 and len:558, etc)
It seems like perl, awk, sed able to archive my desired goal.
Thanks a lot for any advice Smilie

Last edited by patrick87; 03-16-2010 at 04:32 AM..
# 2  
Old 03-16-2010
MySQL solution

Hope this snippet would give you an idea
Code:
use strict;
use warnings;



 open FH , "testfile" or die "$0:$!" ;
 while (<FH> )
 {
     my $line = $_ ;

     # check for a type complete
        if ( $line =~ />type complete/  )
        {
                $line =~ /([0-9]+)\s*$/ ;
                # check for greater than 300
                if ( $1 < 300)
                {
                next ;
                }

            print "$line"  ;
        while ( <FH> )
        {
            # leave the type partial
        if ( $_ !~ />type partial/ )
        {
        print  "$_" ;
        }
        else{
            last ;
        }

        }
        }
 }

Smilie
# 3  
Old 03-16-2010
Code:
awk 'BEGIN{RS=">";FS="\n"} {split($1,a," |:")} a[2]~"complete" && a[4]>=300'  urfile

# 4  
Old 03-16-2010
Another one,

Code:
awk -F: '/complete/ && $2>=300  {c=1} c++<=4' file


Last edited by dennis.jacob; 03-16-2010 at 06:40 AM..
# 5  
Old 03-16-2010
Hi rdcwayx,
Thanks for your previous suggestion Smilie
Can I ask your advice about the question this time?
I got change a bit about the header format.
Thanks again for your advice.
# 6  
Old 03-16-2010
Code:
awk 'BEGIN{RS=">";FS="\n"} {split($1,a," |:")} {if (a[9]~"complete" && a[11]>=300) print ">"$0}' ORS="" urfile

# 7  
Old 03-16-2010
Modified one based on the changes you made now

Code:
awk -F: '/complete/ && $NF>=300 {c=1} c++<=4' file

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract specific line in an html file starting and ending with specific pattern to a text file

Hi This is my first post and I'm just a beginner. So please be nice to me. I have a couple of html files where a pattern beginning with "http://www.site.com" and ending with "/resource.dat" is present on every 241st line. How do I extract this to a new text file? I have tried sed -n 241,241p... (13 Replies)
Discussion started by: dejavo
13 Replies

2. Shell Programming and Scripting

Help with remove duplicate content and only keep the first content detail

Input data_10 SSA data_2 TYUE data_3 PEOCV data_6 SSAT data_21 SSA data_19 TYUEC data_14 TYUE data_15 SSA data_32 PEOCV . . Desired Output data_10 SSA data_2 TYUE data_3 PEOCV data_6 SSAT data_19 TYUEC (9 Replies)
Discussion started by: patrick87
9 Replies

3. Shell Programming and Scripting

Extract all content that match exactly only specific word

Input: 21 templeta parent 35718 36554 . - . ID=parent_cluster_50.21.11; Name=Partial%20parent%20for%20training%20set; 21 templeta kids 35718 36554 . - . ID=_52; Parent=parent_cluster_5085.21.11; 21 templeta ... (7 Replies)
Discussion started by: patrick87
7 Replies

4. Shell Programming and Scripting

Extract specific content from data and rename its header problem asking

Input file 1: >pattern_5 GAATTCGTTCATGTAGGTTGASDASFGDSGRTYRYGHDGSDFGSDGGDSGSDGSDFGSDF ATTTAATTATGATTCATACGTCATATGTTATTATTCAATCGTATAAAATTATGTGACCTT SDFSDGSDFKSDAFLKJASLFJASKLFSJAKJFHASJKFHASJKFHASJKFHSJAKFHAW >pattern_1 AAGTCTTAAGATATCACCGTCGATTAGGTTTATACAGCTTTTGTGTTATTTAAATTTGAC... (10 Replies)
Discussion started by: patrick87
10 Replies

5. Shell Programming and Scripting

Remove specific pattern header and its content problem facing

Input file: >TRACK: Position: 1 TYPE: 1 Pos: SVAVPQRHHPGGTVFREPIIIPAIPRLVPGWNKPIIIGRHAFGDQYRATDRVIPGPGKLE LVYTPVNGEPETVKVYDFQGGGIAQTQYNTDESIRGFAHASFQMALLKGLPLYMSTKNTI LKRYDGRFKDIFQEIYESTYQKDFEAKNLWYEHRLIDDMVAQMIKSEGGFVMALKNYDGD >TRACK: Position: 1 TYPE: 2 Pos: FAHASFQMALLKGLPLYMS... (8 Replies)
Discussion started by: patrick87
8 Replies

6. Shell Programming and Scripting

Manipulate data in detail problem facing

Input Participant number: HAC Position type Location Distance_start Distance_end Range Mark 1 1 + Front 808 1083 276 2 1 + Front 1373 1636 264 3 1 - Back 1837 2047 211 Participant number: BCD Position type... (6 Replies)
Discussion started by: patrick87
6 Replies

7. Shell Programming and Scripting

Extract specific data content from a long list of data

My input: Data name: ABC001 Data length: 1000 Detail info Data Direction Start_time End_time Length 1 forward 10 100 90 1 forward 15 200 185 2 reverse 50 500 450 Data name: XFG110 Data length: 100 Detail info Data Direction Start_time End_time Length 1 forward 50 100 50 ... (11 Replies)
Discussion started by: patrick87
11 Replies

8. Shell Programming and Scripting

Extract all the content after a specific data

My input: >seq_1 DSASSTRRARRRRTPRTPSLRSRRSDVTCS >seq_3 RMRLRRWRKSCSERS*RRSN >seq_8 RTTGLSERPRLPTTASRSISSRWTR >seq_10 NELPLEKGSLDSISIE >seq_9 PNQGDAREPQAHLPRRQGPRDRPLQAYA+ QVQHRRHDHSRTQH*LCRRRQREDCDRLHR >seq_4 DRGKGQAGCRRPQEGEALVRRCS>seq_6 FA*GLAAQDGEA*SGRG My output: Extract all... (22 Replies)
Discussion started by: patrick87
22 Replies

9. Shell Programming and Scripting

Extract specific content from a file

My input file: >sequence_1 ASSSSSSSSSSSDDDDDDDDDDDCCCCCCC ASDSFDFFDFDFFWERERERERFSDFESFSFD >sequence_2 ASDFDFDFFDDFFDFDSFDSFDFSDFSDFDSFASDSADSADASD ASDFFDFDFASFASFASFAFSFFSDASFASFASFAFS >sequence_3 VEDFGSDGSDGSDGSDGSDGSDGSDG dDFSDFSDFSDFSDFSDFSDFSDFSDF SDGFDGSFDGSGSDGSDGSDGSDGSDG My... (22 Replies)
Discussion started by: patrick87
22 Replies

10. Shell Programming and Scripting

Shell script or command help to extract specific contents from a long list of content

Hi, I got a long list of contents: >sequence_1 ASSSSSSSSSSSDDDDDDDDDDDCCCCCCC ASDSFDFFDFDFFWERERERERFSDFESFSFD >sequence_2 ASDFDFDFFDDFFDFDSFDSFDFSDFSDFDSFASDSADSADASD ASDFFDFDFASFASFASFAFSFFSDASFASFASFAFS >sequence_3 VEDFGSDGSDGSDGSDGSDGSDGSDG dDFSDFSDFSDFSDFSDFSDFSDFSDF... (2 Replies)
Discussion started by: patrick87
2 Replies
Login or Register to Ask a Question