Extract specific content from a file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract specific content from a file
# 15  
Old 10-12-2009
Quote:
Originally Posted by steadyonabix
I see Smilie Why the single , before the 0 though, what does that mean in this context?

Code:
awk '/_3$/{exit}/_2$/,0' infile

It's the awk range pattern, from Effective AWK Programming:

Quote:
A range pattern is made of two patterns separated by a comma, in the form ‘begpat,
endpat’. It is used to match ranges of consecutive input records. The first pattern, begpat,
controls where the range begins, while endpat controls where the pattern ends. For example,
the following:
awk ’$1 == "on", $1 == "off"’ myfile
prints every record in ‘myfile’ between ‘on’/‘off’ pairs, inclusive.
In the above code it means:

from the record that matches the _2$ pattern to the end of the input (0 -> false -> never -> eof).

And of course, we exit prematurely because of the previous action.

Just a few words about the beauty of the programming code ...
We often try to play golf[1] here and we're doing it for fun.
In my opinion, a piece of code or a program is beautiful when:

- it's self documenting (!)
- concise and simple (simple as possible)
- it takes advantage of the full functionality/potential of the given programming language

That said, at least as far as my posts are concerned, you should take those obfuscated and golfed samples for what they are.
Try to understand them, use them on the command line, but don't use them in scripts and/or production code.
Think about the next maintainer of that code.


1. en.wikipedia.org/wiki/Perl_golf#Perl_golf
# 16  
Old 10-13-2009
Bug

Thanks thats good advice about the readability.What prompted me to join this forum is the need to learn to write code that runs as quickly as possible. At work I am now writing tools that run against gigabytes of data written in ksh and nawk. I have been learning to optimise code recently and am astonished by the improvement in speed that can be achieved, particularly when creating extra processes in a loop.One script I optimised recently went from 5+hrs to 20 mins run time simply by minimising the processes being kicked off in two loops!Hence my interest in writing "lean" code....Cheers
# 17  
Old 10-14-2009
Log file data extraction

Hi danmero,

My input file:
Code:
>sequence_1
ASSSSSSSSSSSDDDDDDDDDDDCCCCCCC
ASDSFDFFDFDFFWERERERERFSDFESFSFD
>sequence_2
ASDFDFDFFDDFFDFDSFDSFDFSDFSDFDSFASDSADSADASD
ASDFFDFDFASFASFASFAFSFFSDASFASFASFAFS
>sequence_3
VEDFGSDGSDGSDGSDGSDGSDGSDG
dDFSDFSDFSDFSDFSDFSDFSDFSDF
SDGFDGSFDGSGSDGSDGSDGSDGSDG
>ABC_6
SAASASASASASASTSDGSDGSDGSDG
dDFSDFSDFSDFSDFSDFSDFSDFSDF
>SDF_7
TASDASDAFSDFSDFSDFSDFSDFSDF
SDGFDGSFDGSGSDGSDGSDGSDGSDG


My desired output file:
Code:
>sequence_2
ASDFDFDFFDDFFDFDSFDSFDFSDFSDFDSFASDSADSADASD
ASDFFDFDFASFASFASFAFSFFSDASFASFASFAFS
>ABC_6
SAASASASASASASTSDGSDGSDGSDG
dDFSDFSDFSDFSDFSDFSDFSDFSDF
>SDF_7
TASDASDAFSDFSDFSDFSDFSDFSDF
SDGFDGSFDGSGSDGSDGSDGSDGSDG

If I got a long list of file, how I can use your script or program to extract only the contents of sequence_2,ABC_6,SDF_7?
Do you have any idea how I can extract specific content only from a long list of file?
As I try, the awk script that you suggested only can extract sequence_2 from a long list of file.
Thanks againSmilie
# 18  
Old 10-14-2009
Code:
awk '$1~ /sequence_2|ABC_6|SDF_7/{$1=">"$1;print}' RS=">" ORS="" FS=OFS="\n" file



---------- Post updated at 10:27 AM ---------- Previous update was at 09:24 AM ----------

To keep the forums high quality for all users, please take the time to format your posts correctly.
  1. Use Code Tags when you post any code or data samples so others can easily read your code.
    You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags [code] and [/code] by hand.)
  2. Avoid adding color or different fonts and font size to your posts.
    Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.
  3. Be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.

Thank You.

The UNIX and Linux Forums
Reply With Quote
# 19  
Old 11-04-2009
Hi, danmero.
I just found out that by using the code that you suggested:
Code:
awk '$1~ /sequence_2|ABC_6|SDF_7/{$1=">"$1;print}' RS=">" ORS="" FS=OFS="\n" file

If my file also got the content header like ABC_61,ABC_605,SDF_750.
All of them, the code that you suggested also will extract.
Do you have any better idea just specific and extract only sequence_2,ABC_6 and SDF_7. Really thanks for your suggestion ^^
# 20  
Old 11-05-2009
Let's try to work around Smilie
Code:
awk '$1~">"{f=0}$1~">" && $1~/[sequence_2|ABC_6|SDF_7]$/{f=1}f'  file

# 21  
Old 11-05-2009
seem like it is no work as well Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Solaris

Extract content of .dump file

We have been provided a .dump file.The need is to extract the contents(may includes files and folder). ls -lZ didnt help me as Z is not a valid option. How to extract the file contents ? (7 Replies)
Discussion started by: vinil
7 Replies

2. Shell Programming and Scripting

Extract specific line in an html file starting and ending with specific pattern to a text file

Hi This is my first post and I'm just a beginner. So please be nice to me. I have a couple of html files where a pattern beginning with "http://www.site.com" and ending with "/resource.dat" is present on every 241st line. How do I extract this to a new text file? I have tried sed -n 241,241p... (13 Replies)
Discussion started by: dejavo
13 Replies

3. Shell Programming and Scripting

Extract Content from a file

I have an input file with contents like: ./prbru6/12030613.LOG:24514|APPL|prbru6.8269.RTUDaemon.1|?|13:49:56|12/03/06|GMT+3|?|RTUServer Error:Count of Internal Error Qty (-1) < 0, for Audit group id - 1L5XVJ6DQE36AXL, after record number,1, File: EventAuditor.cc, Line: 394|? ... (5 Replies)
Discussion started by: rkrish
5 Replies

4. Shell Programming and Scripting

perl extract content of file

I'm using Mail::Internet module, which will basically filter through email content and extract the body of the message my perl script to extract the body of the email #!/usr/bin/perl -w use Mail::Internet; @lines = <STDIN>; $mi_obj = new Mail::Internet(); ... (2 Replies)
Discussion started by: amlife
2 Replies

5. Shell Programming and Scripting

Extract all content that match exactly only specific word

Input: 21 templeta parent 35718 36554 . - . ID=parent_cluster_50.21.11; Name=Partial%20parent%20for%20training%20set; 21 templeta kids 35718 36554 . - . ID=_52; Parent=parent_cluster_5085.21.11; 21 templeta ... (7 Replies)
Discussion started by: patrick87
7 Replies

6. Shell Programming and Scripting

Extract specific content from data and rename its header problem asking

Input file 1: >pattern_5 GAATTCGTTCATGTAGGTTGASDASFGDSGRTYRYGHDGSDFGSDGGDSGSDGSDFGSDF ATTTAATTATGATTCATACGTCATATGTTATTATTCAATCGTATAAAATTATGTGACCTT SDFSDGSDFKSDAFLKJASLFJASKLFSJAKJFHASJKFHASJKFHASJKFHSJAKFHAW >pattern_1 AAGTCTTAAGATATCACCGTCGATTAGGTTTATACAGCTTTTGTGTTATTTAAATTTGAC... (10 Replies)
Discussion started by: patrick87
10 Replies

7. Shell Programming and Scripting

Way to extract detail and its content above specific value problem asking

Input file: >position_10 sample:68711 coords:5453-8666 number:3 type:complete len:344 MSINQYSSDFHYHSLMWQQQQQQQQHQNDVVEEKEALFEKPLTPSDVGKLNRLVIPKQHA ERYFPLAAAAADAVEKGLLLCFEDEEGKPWRFRYSYWNSSQSYVLTKGWSRYVKEKHLDA NRTS* >position_4 sample:68711 coords:553-866 number:4 type:partial len:483... (7 Replies)
Discussion started by: patrick87
7 Replies

8. Shell Programming and Scripting

Extract specific data content from a long list of data

My input: Data name: ABC001 Data length: 1000 Detail info Data Direction Start_time End_time Length 1 forward 10 100 90 1 forward 15 200 185 2 reverse 50 500 450 Data name: XFG110 Data length: 100 Detail info Data Direction Start_time End_time Length 1 forward 50 100 50 ... (11 Replies)
Discussion started by: patrick87
11 Replies

9. Shell Programming and Scripting

Extract all the content after a specific data

My input: >seq_1 DSASSTRRARRRRTPRTPSLRSRRSDVTCS >seq_3 RMRLRRWRKSCSERS*RRSN >seq_8 RTTGLSERPRLPTTASRSISSRWTR >seq_10 NELPLEKGSLDSISIE >seq_9 PNQGDAREPQAHLPRRQGPRDRPLQAYA+ QVQHRRHDHSRTQH*LCRRRQREDCDRLHR >seq_4 DRGKGQAGCRRPQEGEALVRRCS>seq_6 FA*GLAAQDGEA*SGRG My output: Extract all... (22 Replies)
Discussion started by: patrick87
22 Replies

10. Shell Programming and Scripting

Shell script or command help to extract specific contents from a long list of content

Hi, I got a long list of contents: >sequence_1 ASSSSSSSSSSSDDDDDDDDDDDCCCCCCC ASDSFDFFDFDFFWERERERERFSDFESFSFD >sequence_2 ASDFDFDFFDDFFDFDSFDSFDFSDFSDFDSFASDSADSADASD ASDFFDFDFASFASFASFAFSFFSDASFASFASFAFS >sequence_3 VEDFGSDGSDGSDGSDGSDGSDGSDG dDFSDFSDFSDFSDFSDFSDFSDFSDF... (2 Replies)
Discussion started by: patrick87
2 Replies
Login or Register to Ask a Question