awk multiline matching


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk multiline matching
# 1  
Old 08-29-2011
awk multiline matching

I have a file that looks something like this with lots of text before and after.


Code:
Distance method: Sum of squared size difference (RST)
</data> <pairwiseDifferenceMatrix time="02/08/11 at 13:08:27">

                       1          2
            1  448.82151  507.94231
            2   56.51684  454.02943
</pairwiseDifferenceMatrix> <data>


I want to extract the diagonal values 448.82 and 454.03. I was trying to first get the lines the values were one and could only get values from the search line. Is the white space messing things up or am I not specifying the field separator correctly? Here is the script I am using.

Code:
awk ' BEGIN {FS="\n"} /<pairwiseDifferenceMatrix/{print $3, $4}' inputfile.txt >> outputfile.txt

Is the white space messing things up or am I not specifying the field separator correctly?

Any advice would be greatly appreciated.
Thank you

Last edited by radoulov; 08-30-2011 at 05:06 PM.. Reason: Code tags.
# 2  
Old 08-29-2011
can you explain a bit more.

if you have these tags in your file

Code:
<pairwiseDifferenceMatrix time="02/08/11 at 13:08:27">

 1 2
 1 448.82151 507.94231
 2 56.51684 454.02943
 </pairwiseDifferenceMatrix> <data>

are you always going to be looking for the first number on the line beginning with "1", and the second number on the line beginning with "2"
?
# 3  
Old 08-30-2011
Yes, but there are other matrices in the file with similar format. I need to pull the data out of this one, which has the unique phrase "<pairwiseDifferenceMatrix" proceeding it. The time stamp will change also throughout the different files I'll be using this script on.
# 4  
Old 08-30-2011
Do it in a small steps. Test input/output and learn incremental:
1. Find all your chunks and learn their structure:
Code:
awk '/<pairwise.../, /<\/pairwise.../' INPUTFILE

2. It looks like you want process lines with only 3 fields, pipe the output to the next awk:
Code:
awk 'NF == 3'

Maybe it's not enough and you want something like
Code:
awk 'NF == 3 && $1 ~ /[0-9]/

or
Code:
... && $2 ~ /^[0-9.]+$/ && $3 ~ ...

3. You should get the even(!) number of lines. You can check it piping to "wc -l"
4. You can squeeze your lines with:
Code:
sed 'N; s/\n/ /'

5. And cut them with
Code:
cut -d' ' -f2,6

6. And then you can format your numbers with printf:
Code:
xargs printf "%.2f %.2f\n"

The final result:
Code:
awk '/<pairwise/, /<\/pairwise/' INPUTFILE | 
awk 'NF == 3' | 
sed 'N; s/\n/ /' | 
cut -d' ' -f2,6  | 
xargs printf "%.2f %.2f\n"

And one more time - you can change, tune, test every your step separately.
This User Gave Thanks to yazu For This Post:
# 5  
Old 08-30-2011
Try this awk script...
Code:
awk '
   /pairwiseDifferenceMatrix/     {f=1}
   /^<\/pairwiseDifferenceMatrix/ {f=0}
   f && NF==3 {printf("%s ",$2);getline;print $3}
' file

# 6  
Old 08-30-2011
Shamrock,
I was wondering if you would be able to break that awk command apart with comments if you wouldn't mind? I'm trying to understand what the purpose of {f=1} & {f=0} are.

This looks like something I may be able to use at some point but I apologize, I don't understand what parts of it are doing Smilie
# 7  
Old 08-30-2011
Quote:
Originally Posted by jtollefson
Shamrock,
I was wondering if you would be able to break that awk command apart with comments if you wouldn't mind? I'm trying to understand what the purpose of {f=1} & {f=0} are.

This looks like something I may be able to use at some point but I apologize, I don't understand what parts of it are doing Smilie
awk scripts are made up of pattern/action pairs which are executed on every line that awk reads.

/pairwiseDifferenceMatrix/ tells awk that whenever it sees that pattern on a line the action should be to enable a flag variable f...set f to 1...so this is kind of like saying START.

/^<\/pairwiseDifferenceMatrix/ tells awk that whenever it sees that pattern on a line the action should be to disable the flag variable f...set it to zero...so this is kind of like saying STOP.

f && NF==3 tells awk that if "f" is non-zero and NF (number of fields) equals 3...it should print the 2nd field of the current line...followed by getting the next line and printing its third field.
Code:
awk '
   /pairwiseDifferenceMatrix/     {f=1}
   /^<\/pairwiseDifferenceMatrix/ {f=0}
   f && NF==3 {printf("%s ",$2);getline;print $3}
' file


Last edited by shamrock; 08-30-2011 at 05:34 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to combine all matching dates and remove non-matching

Using the awk below I am able to combine all the matching dates in $1, but I can not seem to remove the non-matching from the file. Thank you :). file 20161109104500.0+0000,x,5631 20161109104500.0+0000,y,2 20161109104500.0+0000,z,2 20161109104500.0+0000,a,4117... (3 Replies)
Discussion started by: cmccabe
3 Replies

2. Shell Programming and Scripting

awk: matching and not matching

Hello all, simple matching and if not match problem that i can't figure out. file1 hostname: 30 10 * * * /home/toto/start PROD instance_name1 -p 00 9 * * * /home/toto/start PROD instance_name2 -p 15 8 * * * /home/toto/start PROD instance_name3 -p hostname2: 00 8 * * *... (5 Replies)
Discussion started by: maverick72
5 Replies

3. Shell Programming and Scripting

Multiline parenthesis matching, with e.g. SED script, in LaTeX doc

In a LaTeX manuscript, I need to replace many occurrences of \emph{some string} with some string, i.e. whatever string is inside. The string inside often may extend over several lines, and there may be other occurences of curly brackets inside it. So for example \emph{this \it{is} a... (5 Replies)
Discussion started by: sune
5 Replies

4. Shell Programming and Scripting

awk multiline with 1 or more variables (question)

am trying to grab the fields marked in red for monitoring purposes. each vfiler can have anywhere from 0-50 Path(s). in order to get here i run the following for filer in `cat filers.list` ; do ssh $filer vfiler status | awk '{print $1}' ; done this returns vfiler0 vfilert vfiler2 ... (2 Replies)
Discussion started by: riegersteve
2 Replies

5. Shell Programming and Scripting

Multiline pattern search using sed or awk

Hi friends, Could you please help me to resolve the below issue. Input file :- <Node> <username>abc</username> <password>ABC</password> <Node> <Node> <username>xyz</username> <password>XYZ</password> <Node> <Node> <username>mnp</username> ... (3 Replies)
Discussion started by: haiksuresh
3 Replies

6. Shell Programming and Scripting

Awk match a multiline pattern

Hello! i wanna match in a config file, one text with more than one lines, something like this: CACHE_SIZE{ 10000 M } I have problems with the ends of line, i think that i can match the end of the line with \n, but i can't get it Someone can help me with the regular expression? ... (18 Replies)
Discussion started by: claw82
18 Replies

7. Shell Programming and Scripting

pattern matching using awk.

Dear Team, How do we match two patterns on the same line using awk?Are there any logical operators which i could use in awk like awk '\gokul && chennai\' <filename> Eg: Input file: gokul,10/11/1986,coimbatore. gokul,10/11/1986,bangalore. gokul,12/04/2008,chennai.... (2 Replies)
Discussion started by: gokulj
2 Replies

8. Shell Programming and Scripting

multiline pattern matching

Hi, I have a file of the following from: Afghanistan gdpcapit|800 Akrotiri Albania gdpcapit|6000 now I want have the gdpcapit value next to the country when there is one like this: Afghanistan 800 gdpcapit|800 Akrotiri Albania 6000 gdpcapit|6000 How do I do this? I've... (4 Replies)
Discussion started by: KarelVH
4 Replies

9. Shell Programming and Scripting

Awk Multiline Record Combine?

I'm trying to use Awk to get the id and name fields ($1 and $2) of file1 combined with their corresponding multiline records in file2 that are separated by blank line. Both files are ordered so that the first line of file1 corresponds to the first set of multiline records in file2 and so on. ... (4 Replies)
Discussion started by: RacerX
4 Replies

10. Shell Programming and Scripting

Awk Compare Files w/Multiline Records

I'm trying to compare the first column values in two different files that use a numerical value as the key and output the more meaningful value found in the second column of file1 in front of the matching line(s) in file2. My problem is that file2 has multiple records. For example given: FILE1... (4 Replies)
Discussion started by: RacerX
4 Replies
Login or Register to Ask a Question