Extract specific content from a file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract specific content from a file
# 1  
Old 10-07-2009
Extract specific content from a file

My input file:
Code:
>sequence_1
ASSSSSSSSSSSDDDDDDDDDDDCCCCCCC
ASDSFDFFDFDFFWERERERERFSDFESFSFD
>sequence_2
ASDFDFDFFDDFFDFDSFDSFDFSDFSDFDSFASDSADSADASD
ASDFFDFDFASFASFASFAFSFFSDASFASFASFAFS
>sequence_3
VEDFGSDGSDGSDGSDGSDGSDGSDG
dDFSDFSDFSDFSDFSDFSDFSDFSDF
SDGFDGSFDGSGSDGSDGSDGSDGSDG

My desired output file:
Code:
>sequence_2
ASDFDFDFFDDFFDFDSFDSFDFSDFSDFDSFASDSADSADASD
ASDFFDFDFASFASFASFAFSFFSDASFASFASFAFS

I only want to extract the header of sequence_2 and its content.
Do anybody got idea how to do it?
Will awk response faster if got a long list of contents?
Thanks for all of your suggestion Smilie

Last edited by radoulov; 10-07-2009 at 07:55 AM.. Reason: Use code tags, please!
# 2  
Old 10-07-2009
Code:
sed -n -e '/>sequence_3/q' -e '/>sequence_2/,/>sequence_3/p' t1

Put your input & output in code tags for better visibility.
# 3  
Old 10-07-2009
Hi thegeek,
Thanks for your suggestion. It is worked nice.
Can you roughly explain about the reason that you write the code?!
Code:
sed -n -e '/>sequence_3/q' -e '/>sequence_2/,/>sequence_3/p' t1

For example, if I got long list of contents and I only want to extract specific contents based on the interested header, can I use the sed code that you recommend as well?

Quote:
Originally Posted by thegeek
Code:
sed -n -e '/>sequence_3/q' -e '/>sequence_2/,/>sequence_3/p' t1

Put your input & output in code tags for better visibility.
# 4  
Old 10-09-2009
The idea is very simple,

Print from sequence_2 to sequence_3, and when you find a pattern sequence_3 just exit.

So i would very well recommend, as after the sequence_3 your file is not read, sed had been quit, so it is efficient too, is it not ?!

> This terminology in sed is PATTERN addressing.
# 5  
Old 10-09-2009
Thanks a lot, thegeek.
I understand it now d Smilie
hehe...
Do you have any idea to solve this thread:
https://www.unix.com/shell-programmin...#post302360533
It seems like more difficult and complicated Smilie
Thanks a lot for your advice.
# 6  
Old 10-10-2009
Hi Radoulov

Once again I am baffled by the brevity of your code! Smilie

You Explained this one to me a few days ago in another post: -

Code:
awk '/_3$/{exit}/_2$/{f=1}f' file

I just don't get these two at all though, why do they work?

Code:
awk '/_3$/{exit}/_2$/,0' infile

Or:

Code:
awk '/_3$/{exit}/_2$/,_' infile


What is the ,0 and ,_ about?

Last edited by steadyonabix; 10-10-2009 at 03:55 AM.. Reason: code tags
# 7  
Old 10-10-2009
Quote:
Originally Posted by steadyonabix
What is the ,0 and ,_ about?
0 is NULL and _ variable is not set, is NULL.
Literal awk will print from first pattern to the end(NULL) but exit on second pattern.

That's why I like radoulov solutions, you have to ask yourself why Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Solaris

Extract content of .dump file

We have been provided a .dump file.The need is to extract the contents(may includes files and folder). ls -lZ didnt help me as Z is not a valid option. How to extract the file contents ? (7 Replies)
Discussion started by: vinil
7 Replies

2. Shell Programming and Scripting

Extract specific line in an html file starting and ending with specific pattern to a text file

Hi This is my first post and I'm just a beginner. So please be nice to me. I have a couple of html files where a pattern beginning with "http://www.site.com" and ending with "/resource.dat" is present on every 241st line. How do I extract this to a new text file? I have tried sed -n 241,241p... (13 Replies)
Discussion started by: dejavo
13 Replies

3. Shell Programming and Scripting

Extract Content from a file

I have an input file with contents like: ./prbru6/12030613.LOG:24514|APPL|prbru6.8269.RTUDaemon.1|?|13:49:56|12/03/06|GMT+3|?|RTUServer Error:Count of Internal Error Qty (-1) < 0, for Audit group id - 1L5XVJ6DQE36AXL, after record number,1, File: EventAuditor.cc, Line: 394|? ... (5 Replies)
Discussion started by: rkrish
5 Replies

4. Shell Programming and Scripting

perl extract content of file

I'm using Mail::Internet module, which will basically filter through email content and extract the body of the message my perl script to extract the body of the email #!/usr/bin/perl -w use Mail::Internet; @lines = <STDIN>; $mi_obj = new Mail::Internet(); ... (2 Replies)
Discussion started by: amlife
2 Replies

5. Shell Programming and Scripting

Extract all content that match exactly only specific word

Input: 21 templeta parent 35718 36554 . - . ID=parent_cluster_50.21.11; Name=Partial%20parent%20for%20training%20set; 21 templeta kids 35718 36554 . - . ID=_52; Parent=parent_cluster_5085.21.11; 21 templeta ... (7 Replies)
Discussion started by: patrick87
7 Replies

6. Shell Programming and Scripting

Extract specific content from data and rename its header problem asking

Input file 1: >pattern_5 GAATTCGTTCATGTAGGTTGASDASFGDSGRTYRYGHDGSDFGSDGGDSGSDGSDFGSDF ATTTAATTATGATTCATACGTCATATGTTATTATTCAATCGTATAAAATTATGTGACCTT SDFSDGSDFKSDAFLKJASLFJASKLFSJAKJFHASJKFHASJKFHASJKFHSJAKFHAW >pattern_1 AAGTCTTAAGATATCACCGTCGATTAGGTTTATACAGCTTTTGTGTTATTTAAATTTGAC... (10 Replies)
Discussion started by: patrick87
10 Replies

7. Shell Programming and Scripting

Way to extract detail and its content above specific value problem asking

Input file: >position_10 sample:68711 coords:5453-8666 number:3 type:complete len:344 MSINQYSSDFHYHSLMWQQQQQQQQHQNDVVEEKEALFEKPLTPSDVGKLNRLVIPKQHA ERYFPLAAAAADAVEKGLLLCFEDEEGKPWRFRYSYWNSSQSYVLTKGWSRYVKEKHLDA NRTS* >position_4 sample:68711 coords:553-866 number:4 type:partial len:483... (7 Replies)
Discussion started by: patrick87
7 Replies

8. Shell Programming and Scripting

Extract specific data content from a long list of data

My input: Data name: ABC001 Data length: 1000 Detail info Data Direction Start_time End_time Length 1 forward 10 100 90 1 forward 15 200 185 2 reverse 50 500 450 Data name: XFG110 Data length: 100 Detail info Data Direction Start_time End_time Length 1 forward 50 100 50 ... (11 Replies)
Discussion started by: patrick87
11 Replies

9. Shell Programming and Scripting

Extract all the content after a specific data

My input: >seq_1 DSASSTRRARRRRTPRTPSLRSRRSDVTCS >seq_3 RMRLRRWRKSCSERS*RRSN >seq_8 RTTGLSERPRLPTTASRSISSRWTR >seq_10 NELPLEKGSLDSISIE >seq_9 PNQGDAREPQAHLPRRQGPRDRPLQAYA+ QVQHRRHDHSRTQH*LCRRRQREDCDRLHR >seq_4 DRGKGQAGCRRPQEGEALVRRCS>seq_6 FA*GLAAQDGEA*SGRG My output: Extract all... (22 Replies)
Discussion started by: patrick87
22 Replies

10. Shell Programming and Scripting

Shell script or command help to extract specific contents from a long list of content

Hi, I got a long list of contents: >sequence_1 ASSSSSSSSSSSDDDDDDDDDDDCCCCCCC ASDSFDFFDFDFFWERERERERFSDFESFSFD >sequence_2 ASDFDFDFFDDFFDFDSFDSFDFSDFSDFDSFASDSADSADASD ASDFFDFDFASFASFASFAFSFFSDASFASFASFAFS >sequence_3 VEDFGSDGSDGSDGSDGSDGSDGSDG dDFSDFSDFSDFSDFSDFSDFSDFSDF... (2 Replies)
Discussion started by: patrick87
2 Replies
Login or Register to Ask a Question