Parsing String, Search then display rows

04-25-2008

Registered User

11, 0

Join Date: Mar 2006

Last Activity: 9 June 2009, 12:04 AM EDT

Posts: 11

Thanks Given: 0

Thanked 0 Times in 0 Posts

Search a line then display next 2 rows

Get occurence of "open" considering duplicates(get the last open).
Once you are pointing to the last open count 2 rows to get the correct data.

Every begin and end statement, there is a "close" and "open".
There can be many "close" and "open" within the begin and end statement but
we are concerned only on the last entry of "open".

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Input file:
begin
open
datacall1
data1

close
datacall1
data2

open
datacall1
data3
end

begin
close
datacall1
data4

open
datacall1
data5

close
datacall1
data6
end

begin
....
end
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Output:
begin|data3
begin|data5
...........
begin|datax

Thank you for help. It is very much appreciated.

Last edited by buddyme; 04-26-2008 at 10:34 PM..

buddyme

View Public Profile for buddyme

Find all posts by buddyme

04-25-2008

Registered User

11, 0

Join Date: Mar 2006

Last Activity: 9 June 2009, 12:04 AM EDT

Posts: 11

Thanks Given: 0

Thanked 0 Times in 0 Posts

This is what I have so far.

$ awk -F " " /open/'{getline;getline;print $0}' openclose
data1
data3
data5

The problem is, I have to compare data1 and data3 and decides data3 is the last one.

Using sed constructs but still need to get data3 and not display data1
$ sed -n '/open/{n;n;p;}' openclose
data1
data3
data5

Last edited by buddyme; 04-25-2008 at 07:10 PM..

buddyme

View Public Profile for buddyme

Find all posts by buddyme

04-26-2008

Registered User

3,653, 12

Join Date: Mar 2008

Last Activity: 28 March 2011, 6:41 AM EDT

Location: /there/is/only/bin/sh

Posts: 3,653

Thanks Given: 0

Thanked 12 Times in 10 Posts

Your description is not very understandable, but if I understand correctly, you want the data after the last open before each end?

Code:

awk '/open/ { open = 1 }
open && /^data[0-9]/ { data=$0; open = 0 }
/end/ { print data; data = ""; open = 0 }' file

When we see an open, we remember it. When we see a data and have seen an open, we remember the data line, and reset the open state variable to 0. So if more open lines come after this one, but before the end, they will replace the data we have in memory. Now simply print that data when you see an end, and reset the state variables (open, too, just in case).

era

View Public Profile for era

Find all posts by era

04-26-2008

Registered User

11, 0

Join Date: Mar 2006

Last Activity: 9 June 2009, 12:04 AM EDT

Posts: 11

Thanks Given: 0

Thanked 0 Times in 0 Posts

Thank you so much era.

It is very close, sorry I know my example is so confusing.
Let me give another example. Basically when you see the last "open" you count 2 rows to get that line.
Below, when it sees the "open", it counts 2 rows to get "visualbasic".

As an added reqt, I want the date too, "20080204|visualbasic".
----------------------------------------------------------------------

Code:

begin|20080204
 open
   oracle   
   adabas
   server2000
 close 
   html 
   php
   java
   program1
   tcpcaller
 open
   applet
   visualbasic
   winrunner
   qtp
   loadrunner
end

begin|20080409
.....
end

Output:
20080204|visualbasic
20080209|<nextdata>

Thanks! Any help is very much appreciated.

Last edited by buddyme; 04-26-2008 at 10:31 PM..

buddyme

View Public Profile for buddyme

Find all posts by buddyme

04-27-2008

Registered User

544, 43

Join Date: Oct 2006

Last Activity: 27 March 2017, 3:00 AM EDT

Location: Belgium

Posts: 544

Thanks Given: 5

Thanked 43 Times in 29 Posts

If the records are separated by an empty line as in your sample file above, and if there is *exactly* the same number of fields in a record, it's easy:

Code:

#!/usr/bin/awk -f

BEGIN {
        RS=""
        FS="\n"
}

{
        split($1, out, "|")
        sub(/ +/, "", $14)
        print out[2] "|" $14
}

Will not work if the conditions above are not met.

ripat

View Public Profile for ripat

Find all posts by ripat

04-27-2008

Registered User

3,653, 12

Join Date: Mar 2008

Last Activity: 28 March 2011, 6:41 AM EDT

Location: /there/is/only/bin/sh

Posts: 3,653

Thanks Given: 0

Thanked 12 Times in 10 Posts

Here's a slight revamp of my earlier script. When you see "open", start a countdown, and when that reaches zero, grab that line and remember it in data. Grabbing the date (I called it "heading"; maybe you want to change that) is a trivial addition. I set the internal field separator to '|' to make it easy to get the date.

Code:

awk -F '|' '/begin/ { heading = $2 }
/open/ { count = 2; next }
count { if (! --count) data = $0 }
/end/ { print heading "|" data; heading = data = ""; count = 0 }' file

This doesn't trim whitespace before the data value. If that's required, or if the -F option causes trouble elsewhere in the file, maybe something like

Code:

awk '/begin/ { heading = $0; sub (/^[^|]+\|/, "", heading) }
/open/ { count = 2; next }
count { if (! --count) data = $1 }
/end/ { print heading "|" data; heading = data = ""; count = 0 }' file

If the indentation patterns are consistent throughout the file, maybe you want to tighten up the regular expressions to /^begin/ (flush at start of line), /^ open/ (one space before "open"), and /^end/ (flush at start of line, again) in order to avoid accidental matches (maybe you have "openoffice" or "blender" somewhere in those data?)

(Hmm, ripat's use of split to extract the date from the heading is certainly more elegant than my attempt. I'll leave it just to show that there is always more than one way to do it.)

Last edited by era; 04-27-2008 at 03:16 AM.. Reason: split() would have been more elegant, harumph

era

View Public Profile for era

Find all posts by era

04-27-2008

Registered User

544, 43

Join Date: Oct 2006

Last Activity: 27 March 2017, 3:00 AM EDT

Location: Belgium

Posts: 544

Thanks Given: 5

Thanked 43 Times in 29 Posts

And if the record begin --> end has a variable number of fields, there is also a solution using open as field separator. This will isolate the second group of open. A little string manipulation should return the desired result. But you still need an empty line as record separator.

Code:

#!/usr/bin/awk -f

BEGIN {
        RS=""
        FS="open"
}

{
        split($1, dte, "|")
        split($3, out, "\n")
        gsub(/\n| /, "", dte[2])
        sub(/ +/, "", out[3])
        print dte[2] "|" out[3]
}

ripat

View Public Profile for ripat

Find all posts by ripat

UNIX for Advanced & Expert Users

Parsing String, Search then display rows

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Script to find string based on pattern and search for its corresponding rows in column

Discussion started by: as7951

2. Shell Programming and Scripting

Search string in multiple files and display column wise

Discussion started by: sidnow

3. UNIX for Beginners Questions & Answers

Search a string and display its location on the entire string and make a text file

Discussion started by: ANKIT ROY

4. Shell Programming and Scripting

Search string within a file and list common words from the line having the search string

Discussion started by: royzlife

5. Shell Programming and Scripting

Search several string and convert into a single line for each search string using awk command AIX?.

Discussion started by: laknar

6. Shell Programming and Scripting

Parsing the string into several rows

Discussion started by: abhijith321

7. Shell Programming and Scripting

Search a String and display only word.

Discussion started by: indrajit_u

8. Shell Programming and Scripting

parsing rows

Discussion started by: gisele_l

9. Shell Programming and Scripting

Parsing of file for Report Generation (String parsing and splitting)

Discussion started by: umar.shaikh

10. Shell Programming and Scripting

Search for string and display those NOT found

Discussion started by: John Rihn