Visit Our UNIX and Linux User Community


Extract specific content from a file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract specific content from a file
# 1  
Old 10-07-2009
Extract specific content from a file

My input file:
Code:
>sequence_1
ASSSSSSSSSSSDDDDDDDDDDDCCCCCCC
ASDSFDFFDFDFFWERERERERFSDFESFSFD
>sequence_2
ASDFDFDFFDDFFDFDSFDSFDFSDFSDFDSFASDSADSADASD
ASDFFDFDFASFASFASFAFSFFSDASFASFASFAFS
>sequence_3
VEDFGSDGSDGSDGSDGSDGSDGSDG
dDFSDFSDFSDFSDFSDFSDFSDFSDF
SDGFDGSFDGSGSDGSDGSDGSDGSDG

My desired output file:
Code:
>sequence_2
ASDFDFDFFDDFFDFDSFDSFDFSDFSDFDSFASDSADSADASD
ASDFFDFDFASFASFASFAFSFFSDASFASFASFAFS

I only want to extract the header of sequence_2 and its content.
Do anybody got idea how to do it?
Will awk response faster if got a long list of contents?
Thanks for all of your suggestion Smilie

Last edited by radoulov; 10-07-2009 at 07:55 AM.. Reason: Use code tags, please!
# 2  
Old 10-07-2009
Code:
sed -n -e '/>sequence_3/q' -e '/>sequence_2/,/>sequence_3/p' t1

Put your input & output in code tags for better visibility.
# 3  
Old 10-07-2009
Use gawk, nawk or /usr/xpg4/bin/awk on Solaris:

Code:
awk 'END { if (r ~ p) print r }
/^>sequence/ { if (r ~ p) print r; r = x }
{ r = (r ? r RS : x) $0 }
' p="sequence_2" infile



---------- Post updated at 12:47 PM ---------- Previous update was at 12:41 PM ----------

Yes,
thegeek's sed approach should be faster.
Assuming progressive sequence numbers with fixed format,
you could add parameters:

Code:
start="sequence_2"
stop="$(( ${start##*_} + 1 ))"

sed -n "
  /$stop/q
  /$start/,/$stop/p
  " infile



---------- Post updated at 12:51 PM ---------- Previous update was at 12:47 PM ----------

A similar approach with awk:
Code:
awk '$0 ~ stop { exit }
  $0 ~ start, $0 ~ stop {
    if ($0 !~ stop) print 
    }' start="sequence_2" \
stop="$(( ${start##*_} + 1 ))" infile



---------- Post updated at 12:54 PM ---------- Previous update was at 12:51 PM ----------

Notice that the sed and the second awk versions assume an input in numeric (by sequence number) order just like the example in the original post.
# 4  
Old 10-07-2009
Code:
awk '/_3$/{exit}/_2$/{f=1}f' file

# 5  
Old 10-07-2009
Grep if right flavor

If your grep supports -A (--after-context) you could try this:
Code:
grep -A 2 "sequnce_2" infile

My Ubuntu distro has it but I know I had to grab it for my Solaris boxes.
# 6  
Old 10-07-2009
Quote:
Originally Posted by danmero
Code:
awk '/_3$/{exit}/_2$/{f=1}f' file

Well,
for one-shot solutions could be even:

Code:
awk '/_3$/{exit}/_2$/,0' infile

Or:

Code:
awk '/_3$/{exit}/_2$/,_' infile


Last edited by radoulov; 10-07-2009 at 09:25 AM..
# 7  
Old 10-07-2009
Hi thegeek,
Thanks for your suggestion. It is worked nice.
Can you roughly explain about the reason that you write the code?!
Code:
sed -n -e '/>sequence_3/q' -e '/>sequence_2/,/>sequence_3/p' t1

For example, if I got long list of contents and I only want to extract specific contents based on the interested header, can I use the sed code that you recommend as well?

Quote:
Originally Posted by thegeek
Code:
sed -n -e '/>sequence_3/q' -e '/>sequence_2/,/>sequence_3/p' t1

Put your input & output in code tags for better visibility.

Previous Thread | Next Thread
Test Your Knowledge in Computers #352
Difficulty: Easy
Unix was created 25 years before Linux.
True or False?

10 More Discussions You Might Find Interesting

1. Solaris

Extract content of .dump file

We have been provided a .dump file.The need is to extract the contents(may includes files and folder). ls -lZ didnt help me as Z is not a valid option. How to extract the file contents ? (7 Replies)
Discussion started by: vinil
7 Replies

2. Shell Programming and Scripting

Extract specific line in an html file starting and ending with specific pattern to a text file

Hi This is my first post and I'm just a beginner. So please be nice to me. I have a couple of html files where a pattern beginning with "http://www.site.com" and ending with "/resource.dat" is present on every 241st line. How do I extract this to a new text file? I have tried sed -n 241,241p... (13 Replies)
Discussion started by: dejavo
13 Replies

3. Shell Programming and Scripting

Extract Content from a file

I have an input file with contents like: ./prbru6/12030613.LOG:24514|APPL|prbru6.8269.RTUDaemon.1|?|13:49:56|12/03/06|GMT+3|?|RTUServer Error:Count of Internal Error Qty (-1) < 0, for Audit group id - 1L5XVJ6DQE36AXL, after record number,1, File: EventAuditor.cc, Line: 394|? ... (5 Replies)
Discussion started by: rkrish
5 Replies

4. Shell Programming and Scripting

perl extract content of file

I'm using Mail::Internet module, which will basically filter through email content and extract the body of the message my perl script to extract the body of the email #!/usr/bin/perl -w use Mail::Internet; @lines = <STDIN>; $mi_obj = new Mail::Internet(); ... (2 Replies)
Discussion started by: amlife
2 Replies

5. Shell Programming and Scripting

Extract all content that match exactly only specific word

Input: 21 templeta parent 35718 36554 . - . ID=parent_cluster_50.21.11; Name=Partial%20parent%20for%20training%20set; 21 templeta kids 35718 36554 . - . ID=_52; Parent=parent_cluster_5085.21.11; 21 templeta ... (7 Replies)
Discussion started by: patrick87
7 Replies

6. Shell Programming and Scripting

Extract specific content from data and rename its header problem asking

Input file 1: >pattern_5 GAATTCGTTCATGTAGGTTGASDASFGDSGRTYRYGHDGSDFGSDGGDSGSDGSDFGSDF ATTTAATTATGATTCATACGTCATATGTTATTATTCAATCGTATAAAATTATGTGACCTT SDFSDGSDFKSDAFLKJASLFJASKLFSJAKJFHASJKFHASJKFHASJKFHSJAKFHAW >pattern_1 AAGTCTTAAGATATCACCGTCGATTAGGTTTATACAGCTTTTGTGTTATTTAAATTTGAC... (10 Replies)
Discussion started by: patrick87
10 Replies

7. Shell Programming and Scripting

Way to extract detail and its content above specific value problem asking

Input file: >position_10 sample:68711 coords:5453-8666 number:3 type:complete len:344 MSINQYSSDFHYHSLMWQQQQQQQQHQNDVVEEKEALFEKPLTPSDVGKLNRLVIPKQHA ERYFPLAAAAADAVEKGLLLCFEDEEGKPWRFRYSYWNSSQSYVLTKGWSRYVKEKHLDA NRTS* >position_4 sample:68711 coords:553-866 number:4 type:partial len:483... (7 Replies)
Discussion started by: patrick87
7 Replies

8. Shell Programming and Scripting

Extract specific data content from a long list of data

My input: Data name: ABC001 Data length: 1000 Detail info Data Direction Start_time End_time Length 1 forward 10 100 90 1 forward 15 200 185 2 reverse 50 500 450 Data name: XFG110 Data length: 100 Detail info Data Direction Start_time End_time Length 1 forward 50 100 50 ... (11 Replies)
Discussion started by: patrick87
11 Replies

9. Shell Programming and Scripting

Extract all the content after a specific data

My input: >seq_1 DSASSTRRARRRRTPRTPSLRSRRSDVTCS >seq_3 RMRLRRWRKSCSERS*RRSN >seq_8 RTTGLSERPRLPTTASRSISSRWTR >seq_10 NELPLEKGSLDSISIE >seq_9 PNQGDAREPQAHLPRRQGPRDRPLQAYA+ QVQHRRHDHSRTQH*LCRRRQREDCDRLHR >seq_4 DRGKGQAGCRRPQEGEALVRRCS>seq_6 FA*GLAAQDGEA*SGRG My output: Extract all... (22 Replies)
Discussion started by: patrick87
22 Replies

10. Shell Programming and Scripting

Shell script or command help to extract specific contents from a long list of content

Hi, I got a long list of contents: >sequence_1 ASSSSSSSSSSSDDDDDDDDDDDCCCCCCC ASDSFDFFDFDFFWERERERERFSDFESFSFD >sequence_2 ASDFDFDFFDDFFDFDSFDSFDFSDFSDFDSFASDSADSADASD ASDFFDFDFASFASFASFAFSFFSDASFASFASFAFS >sequence_3 VEDFGSDGSDGSDGSDGSDGSDGSDG dDFSDFSDFSDFSDFSDFSDFSDFSDF... (2 Replies)
Discussion started by: patrick87
2 Replies

Featured Tech Videos