awk parsing problem


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk parsing problem
# 1  
Old 07-16-2014
awk parsing problem

Hello fellow unix geeks,

I am having a small dilemna trying to parse a log file I have. Below is a sample of what it will look like:

Code:
MY_TOKEN1(group) TOKEN(other)|SSID1
MY_TOKEN2(group, group2)|SSID2

What I need to do is only keep the MY_TOKEN pieces and where there are multiple tokens in there (see MY_TOKEN2), break that out so I have one per line as such:

Code:
MY_TOKEN1(group)|SSID1
MY_TOKEN2(group)|SSID2
MY_TOKEN2(group2)|SSID2

I will be matching this data up to a second file, which I can do, I just can't get these tokens to break out properly AND to only print those tokens. I could even accept if the other entries were on their own line as I could just strip them out afterwards.

Thanks in advance!!
# 2  
Old 07-16-2014
The following seems to do what you describe.
The FS breaks the line into $1 $2 at the "(" and ")" ... in awk's main loop (surrounding white spaces allowed).
Then split() breaks $2 into A[1], A[2]... at the "," (surrounding white spaces allowed).
These can be printed in a loop.
Code:
awk 'BEGIN {FS="[ \t]*[)(]+[ \t]*"}
$1~/^MY_TOKEN/ {
  n=split($2,A,"[ \t]*,[ \t]*")
  for (i=1; i<=n; i++) print $1 "(" A[i] ")" $NF
}
' file

This User Gave Thanks to MadeInGermany For This Post:
# 3  
Old 07-16-2014
Thanks MadeInGermany.

That got me closer to what I needed. Failure on my part to say that I could "MY_TOKEN1" could be anywhere in the field, but I worked around that by removing the caret. However, after running the script against my data, I am getting a lot more data than expected. What I really want are ONLY those tokens that include "MY_TOKEN#" and exclude all other elements in that field. This is the part that is really eluding me.
# 4  
Old 07-16-2014
Please show us what went wrong with your data. Would have been best to post a representative sample in the first place.

---------- Post updated at 20:48 ---------- Previous update was at 20:34 ----------

Try this if it works for your data:
Code:
awk     '       {match ($1, /MY_TOKEN.\([^)]*\)/)
                 MT=substr($1, RSTART, 9)
                 n=split(substr($1, RSTART+10, RLENGTH-11),T,"[ ,]*")
                 for (i in T) print MT "(" T[i] ")", $2}
        ' FS="|" OFS="|" file
MY_TOKEN1(group)|SSID1
MY_TOKEN2(group)|SSID2
MY_TOKEN2(group2)|SSID2

This User Gave Thanks to RudiC For This Post:
# 5  
Old 07-16-2014
Apologies for not posting actual data. I cleaned it up a bit for outside consumption and limited the amount for brevity, but made sure to include problem samples. Based on these three lines, I would expect to see 1 record for the first line, and 2 records for the third line, with the second line producing no output. Eventually, I will want SUDO_GRP and SUDO_ALIAS records each on their own line, but figure I should tackle this beast one piece at a time as I can always modify and run a second run after the first to keep these two categories separate.:

Raw data:
Code:
GRP(adm),SUDO_GRP(rsm)|900000313                                        
SUDO_ALIAS(HAMU1),SUDO_jtw,systest|900001093                     
SUDO_GRP(analyzer,duladm),analyzer,systest,duladm|900001093


Output for above lines should look like this:
Code:
SUDO_GRP(rsm)|900000313
SUDO_GRP(analyzer)|900001093
SUDO_GRP(duladm)|900001093


And here is the code I used. I did modify what RudiC provided me as that got me closer to the end result. When I ran the code provided by RudiC, I just got a blank line and nothing else.
Code:
 awk     '{match ($1, /SUDO_GRP\([^)]*\)/)
            MT=substr($1, RSTART, 8)
            n=split(substr($1, RSTART+8),T,"[,]*")
            for (i in T) print MT "(" T[i] ")", $2}
            ' FS="|" OFS="|" file

And here is the output I am currently getting with the above code:
Code:
SUDO_GRP((rsm))|900000313
SUDO_ALI(IAS(HAMU1))|900001093
SUDO_ALI(SUDO_jtw)|900001093
SUDO_ALI(systest)|900001093
SUDO_GRP((analyzer)|900001093
SUDO_GRP(duladm))|900001093
SUDO_GRP(analyzer)|900001093
SUDO_GRP(systest)|900001093
SUDO_GRP(duladm)|900001093

As you can see, some of the lines are getting extra parentheses added around them, and it is printing out the SUDO_ALIAS lines, which I want to have omited, but it seems that it did finally omit the GRP only lines.

Last edited by dagamier; 07-17-2014 at 11:03 AM.. Reason: forgot to add the output I am getting based on the provided input data
# 6  
Old 07-17-2014
You should have transcribed above literally. Try this (using "match..." as a pattern, not as part of the action):
Code:
awk     'match ($1, /SUDO_GRP\([^)]*\)/)        {MT=substr($1, RSTART, 8)
                                                 n=split(substr($1, RSTART+9, RLENGTH-10),T,"[,]*")
                                                 for (i in T) print MT "(" T[i] ")", $2}
        ' FS="|" OFS="|" file
SUDO_GRP(rsm)|900000313                                        
SUDO_GRP(analyzer)|900001093
SUDO_GRP(duladm)|900001093

---------- Post updated at 17:19 ---------- Previous update was at 17:13 ----------

Try this to become more flexible:
Code:
awk     'match ($1, MATCH"\([^)]*\)")   {L=length(MATCH)
                                         n=split(substr($1, RSTART+L+1, RLENGTH-L-2), T, "[,]*")
                                         for (i in T) print MATCH "(" T[i] ")", $2}
        ' FS="|" OFS="|" MATCH="SUDO_ALIAS" file
SUDO_ALIAS(HAMU1)|900001093

This User Gave Thanks to RudiC For This Post:
# 7  
Old 07-17-2014
RudiC,

Thanks so much. This actually worked and I like the idea of using the "MATCH=" to set what I want to look for. Totally makes life easier. The only thing I had to do was change the RLENGTH to subtract 1 instead of 2 as it was removing a character from some of the values inside the parentheses.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Problem parsing

Hi, I want to fetch a text.Clipping. ... (5 Replies)
Discussion started by: protocomm
5 Replies

2. Shell Programming and Scripting

Complex text parsing with speed/performance problem (awk solution?)

I have 1.6 GB (and growing) of files with needed data between the 11th and 34th line (inclusive) of the second column of comma delimited files. There is also a lot of stray white space in the file that needs to be trimmed. They have DOS-like end of lines. I need to transpose the 11th through... (13 Replies)
Discussion started by: Michael Stora
13 Replies

3. Homework & Coursework Questions

Problem parsing input with awk

Use and complete the template provided. The entire template must be completed. If you don't, your post may be deleted! 1. The problem statement, all variables and given/known data: I want add a line.For example:- 123456 1 1 0 1 1 0 1 0 0 0 1 5 8 0 12 10 25 its answer... (4 Replies)
Discussion started by: Arsh10
4 Replies

4. Shell Programming and Scripting

Parsing problem

Hello, I have a similar problem so I continue this thread. I have: my_script_to_format_nicely_bdf.sh | grep "RawData" |tr -s ' '|cut -d' ' -f 4|tr -d '%' So it supposed to return the percentage used of RawData FS: 80 (Want to use it in a alert script) However I also have a RawData2 FS so... (17 Replies)
Discussion started by: drbiloukos
17 Replies

5. Shell Programming and Scripting

Another parsing line awk or sed problem

Hi, After looking on different forums, I'm still in trouble to parse a parameters line received in KSH. $* is equal to "/AAA:111 /BBB:222 /CCC:333 /DDD:444" I would like to parse it and be able to access anyone from his name in my KSH after. like echo myArray => display 111 ... (1 Reply)
Discussion started by: RickTrader
1 Replies

6. Shell Programming and Scripting

Parsing Problem

Hi all, I am having problems parsing the following file: cat mylist one,two,three four five,six My goal is to get each number on a seperate line. one two three four five six I tried this command: sed -e 's/\,/^M/g' mylist (11 Replies)
Discussion started by: rob11g
11 Replies

7. Shell Programming and Scripting

Parsing problem

I need to parse a string which looks like "xyx","sdfsdf","asf_asdf" into var1="xyx" var2="sdfsdf" var3="asf_asdf" (3 Replies)
Discussion started by: Sushir03
3 Replies

8. Shell Programming and Scripting

Parsing problem

Hi, i need to parse a string which looks like this "xyz","1233","cm_asdfasdf" (2 Replies)
Discussion started by: Sushir03
2 Replies

9. Shell Programming and Scripting

Parsing problem

I need to separate out the contents in the string "xyz","1233","cm_asdfasdf" as xyz,1233,cm_asdfasdf Can anyone help me on this?? (1 Reply)
Discussion started by: Sushir03
1 Replies

10. Shell Programming and Scripting

awk parsing problem

I need help with a problem that I have not been able to figure out. I have a file that is about 650K lines. Records are seperated by blank lines, fields seperated by new lines. I was trying to make a report that would add up 2 fields and associate them with a CP. example output would be... (11 Replies)
Discussion started by: timj123
11 Replies
Login or Register to Ask a Question