awk used to extract data between text


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk used to extract data between text
# 1  
Old 06-17-2013
awk used to extract data between text

Hello all,
I have a file (filename.txt) with some data (in two columns X and Y) which looks like this:
Code:
##########
'Header1'
'Sub-header1'
X                    Y
xxxx.xx       yyyy.yyy
xxxx.xx       yyyy.yyy
....                 ...

'Sub-header2'
X                    Y
xxxx.xx       yyyy.yyy
xxxx.xx       yyyy.yyy
....                 ...

'Sub-header3'
X                    Y
xxxx.xx       yyyy.yyy
xxxx.xx       yyyy.yyy
....                 ...

#######
'Header2' 
'Sub-header1'
X                    Y
xxxx.xx       yyyy.yyy
xxxx.xx       yyyy.yyy
....                 ...

'Sub-header2'
X                    Y
xxxx.xx       yyyy.yyy
xxxx.xx       yyyy.yyy
....                 ...

'Sub-header3'
X                    Y
xxxx.xx       yyyy.yyy
xxxx.xx       yyyy.yyy
....                 ...

...and so on...

So, the three different 'Sub-headers' under each different header are the same (the same three every time)..., so what I want is to extract the data that is between the 'Sub-headers', what I am doing right now is to apply the following command:
Code:
awk '/Sub-header1/ {getline;getline}{j++}j==1{flag=1;next} /Sub-header2/ {i++}i==1{flag=0} flag {print}' filename.txt > ofile.txt

I am using the {getline;getline} commands to skip the lines of the 'Sub-header1' and 'X Y', but although it does skip those two lines, it also prints the 'Header1' (and this is something I really don't get) and the data I wanted to have.
The reason I want to have just the data is that I want to use it to make a plot with python... (but that's another story). I also would like to get rid of the blank line at the bottom of the set of data that I am extracting, and I tried using instead of the second pattern ('Sub-header2') the blank line (\/n) but it didn't worked.
I've been told not to "abuse" of the getline command since sometimes (unless I really understood what it does) it can give unexpected results, I found also the option of using 'c&&!--c;/Sub-header1/ {c=3} etc... to tell to skip to the third line after the pattern (Sub-header1) but this gives me something even more unexpected.
Hopefully someone followed me until this point Smilie,
Thank you very much!
# 2  
Old 06-17-2013
What do you actually want to print?
The following prints all sections following /Sub-header1/; it stops printing when it meets an empty line, /^$/:
Code:
awk '/Sub-header1/ {getline;getline;flag=1} /^$/ {flag=0} flag {print}' filename.txt

Without getline:
Code:
awk '/Sub-header1/ {flag=1;c=3} /^$/ {flag=0} flag && !(c && --c) {print}' filename.txt

# 3  
Old 06-17-2013
Thanks for your reply, what I want to print is the data that appears following the first 'Sub-header1' and up to before the 'Sub-header2' that's why I added the counters
Code:
 {j++}j==1

, (and then I will modify it to print into a second file the contents of the data between the second set of 'Sub-header1' 'Sub-header2', by changing
Code:
j==1

to
Code:
 j==2

... I tried using the line you gave me, and I see what it does, it prints all the sets of data between this patterns together... I will try now adding my counters to see if I get what I wanted.
Thanks,
# 4  
Old 06-17-2013
Below your example; my awk script will print the lines with <this
Code:
##########
'Header1'
'Sub-header1'
X                    Y
xxxx.xx       yyyy.yyy   <this
xxxx.xx       yyyy.yyy   <this
....                 ... <this

'Sub-header2'
X                    Y
xxxx.xx       yyyy.yyy
xxxx.xx       yyyy.yyy
....                 ...

'Sub-header3'
X                    Y
xxxx.xx       yyyy.yyy
xxxx.xx       yyyy.yyy
....                 ...

#######
'Header2' 
'Sub-header1'
X                    Y
xxxx.xx       yyyy.yyy   <this
xxxx.xx       yyyy.yyy   <this
....                 ... <this

'Sub-header2'
X                    Y
xxxx.xx       yyyy.yyy
xxxx.xx       yyyy.yyy
....                 ...

'Sub-header3'
X                    Y
xxxx.xx       yyyy.yyy
xxxx.xx       yyyy.yyy
....                 ...

This User Gave Thanks to MadeInGermany For This Post:
# 5  
Old 06-17-2013
Thanks for the explanation, now, what can I do if I want to print only the first set of lines with
Code:
<this

or only de second set of lines with
Code:
<this

?
Thanks again!
# 6  
Old 06-17-2013
This prints the 2nd occurrence:
Code:
awk '/Sub-header1/ && ++n==2 {flag=1; c=3} /^$/ {flag=0} flag && !(c && --c) {print}' filename.txt

You also can give the search criteria as additional arguments:
Code:
awk '$0~search && ++n==num {flag=1; c=3} /^$/ {flag=0} flag && !(c && --c) {print}' search="Sub-header1" num=2 filename.txt

This User Gave Thanks to MadeInGermany For This Post:
# 7  
Old 06-17-2013
Thank you so much, I spend hours yesterday trying to figure this out myself! using awk is fun, and simplifies a lot work (when you know how to use it, but on the mean time, it can be painful after some hours of try and error).
Thanks!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Shell - Read a text file with two words and extract data

hi I made this simple script to extract data and pretty much is a list and would like to extract data of two words separated by commas and I would like to make a new text file that would list these extracted data into a list and each in a new line. Example that worked for me with text file... (5 Replies)
Discussion started by: dandaryll
5 Replies

2. Shell Programming and Scripting

regular expression with shell script to extract data out of a text file

hi i am trying to extract some specific data out of a text file using regular expressions with shell script that is using a multiline grep .. and the tool i am using is pcregrep so that i can get compatibility with perl's regular expressions for a sample data like this, i am trying to grab... (6 Replies)
Discussion started by: vemkiran
6 Replies

3. Shell Programming and Scripting

extract the data using AWK command

In a file i have a data like INPUT: no,name,company 1,vivek,hcl 2,senthil,cts 1,narsi,hcl 4,prabhakaran,ibm OUTPUT: 1,vivek,hcl 1,narsi,hcl Using AWK command i want to display the names those having no:1 and company:hcl.Please tell me the command to display above result. (8 Replies)
Discussion started by: katakamvivek
8 Replies

4. Shell Programming and Scripting

extract data with awk

i have a following output file PF Release 2.4 on SERVICE at Mon Feb 6 18:41:02 2012 ---------------------------------------- ---------------- |pPF |SEP |CAPS |CALLS |OPEN | |-------------------------------------------------------------| | 0 ---... (1 Reply)
Discussion started by: gauravah
1 Replies

5. UNIX for Dummies Questions & Answers

Help Using awk to Extract Data

Hi. Im new to UNIX also in programming language which in need help to output like what was I indicated using either awk shell programming or combination of some commands. Correct me if im in the wrong section. Thanks in advance. Input 101 The quick brown fox jumps over the lazy dog 99... (9 Replies)
Discussion started by: bankai29
9 Replies

6. Shell Programming and Scripting

Extract Data - awk

I need to extract columns but the way it should be stored in a file is different.I can simply do a cut -f3,2 filename but the problem is even if i do it so and the values in column 2 are string then col 2 would be appear before col3 I tried awk but using the substr i think its not possible to... (8 Replies)
Discussion started by: dinjo_jo
8 Replies

7. Shell Programming and Scripting

SED to extract HTML text data, not quite right!

I am attempting to extract weather data from the following website, but for the Victoria area only: Text Forecasts - Environment Canada I use this: sed -n "/Greater Victoria./,/Fraser Valley./p" But that phrasing does not sometimes get it all and think perhaps the website has more... (2 Replies)
Discussion started by: lagagnon
2 Replies

8. Shell Programming and Scripting

Extract multiple repeated data from a text file

Hi, I need to extract data from a text file in which data has a pattern. I need to extract all repeated pattern and then save it to different files. example: input is: ST*867*000352214 BPT*00*1000352214*090311 SE*1*1 ST*867*000352215 BPT*00*1000352214*090311 SE*1*2 ... (5 Replies)
Discussion started by: apjneeraj
5 Replies

9. Shell Programming and Scripting

How to extract data from a text file

Hello All, Is there an easy way to extract data from a text file? The text file is actually a dump of a 2 page report with 6 columns and 122 lines. Example is Report Tile Type Product 1 Product 2 Product 3 Product 4... (1 Reply)
Discussion started by: negixx
1 Replies

10. Shell Programming and Scripting

Extract data segment using awk??

How do I filter a long report, with the "STARTWORD" and "STOPWORD" as the variables to use in my awk command, to print the whole data segment that only contains the matched start/stop word? awk '/start/, /stop/' file <- this prints the line, though I need to print the whole segment. Newline... (1 Reply)
Discussion started by: apalex
1 Replies
Login or Register to Ask a Question