Seperate complicated fields with awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Seperate complicated fields with awk
# 1  
Old 01-27-2009
Seperate complicated fields with awk

Hello, I want to separate fields from an log output like this:


11-JUL-2008 23:14:25 * (CONNECT_DATA=(SERVICE_NAME=WUMMER.IM.HERE.EXELLENT.COM)(CID=(PROGRAM=D:\oracle\product\10.2.0\clien t_1\jdk\jre\bin\java.exe)(HOST=X900005199)(USER=FTET1))) * (ADDRESS=(PROTOCOL=tcp)(HOST=45.137.251.223)(PORT=2196)) * establish * WUMMER.IM.HERE.EXELLENT.COM * 0
11-JUL-2008 23:20:20 * (CONNECT_DATA=(SID=P1VPMHAM)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))) * (ADDRESS=(PROTOCOL=tcp)(HOST=133.52.24.148)(PORT=1462)) * establish * WUMMER * 0

into:

$1 = 11-JUL-2008 23:14:25
$2 = (CONNECT_DATA=(SERVICE_NAME=WUMMER.IM.HERE.EXELLENT.COM)
$3= (CID=(PROGRAM=D:\oracle\product\10.2.0\client_1\jdk\jre\bin\java.exe)
$4= (HOST=X900005199)
$5= (USER=FTET1)
$6= (ADDRESS=(PROTOCOL=tcp)
$7= (HOST=45.137.251.223)
$8= (PORT=2196)

I've tried to play with the FS seperator with mixed results:
awk -F'(*[^(]*)' '{ print $1 " " $2 " " $3 }' listener.log

Anyone an idea for me, I think i need the correct regular expression.
# 2  
Old 01-27-2009
Hammer & Screwdriver This may start you off...

What I did was replace any ( with ~( so I could use the ~ as a delimiter.


> cat file149
11-JUL-2008 23:14:25 * (CONNECT_DATA=(SERVICE_NAME=WUMMER.IM.HERE.EXELLENT.COM)(CID=(PROGRAM=D:\oracle\product\10.2.0\clien t_1\jdk\jre\bin\java.exe)(HOST=X900005199)(USER=FTET1))) * (ADDRESS=(PROTOCOL=tcp)(HOST=45.137.251.223)(PORT=2196)) * establish * WUMMER.IM.HERE.EXELLENT.COM * 0
11-JUL-2008 23:20:20 * (CONNECT_DATA=(SID=P1VPMHAM)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))) * (ADDRESS=(PROTOCOL=tcp)(HOST=133.52.24.148)(PORT=1462)) * establish * WUMMER * 0

Code:
> sed "s/(/~(/g" <file149 >file149.a
> awk -F"~" '{print "1="$1,"\n2="$2$3,"\n3="$4$5,"\n4="$6,"\n5="$7"\n"}' file149.a
1=11-JUL-2008 23:14:25 *  
2=(CONNECT_DATA=(SERVICE_NAME=WUMMER.IM.HERE.EXELLENT.COM) 
3=(CID=(PROGRAM=D:\oracle\product\10.2.0\clien t_1\jdk\jre\bin\java.exe) 
4=(HOST=X900005199) 
5=(USER=FTET1))) * 

1=11-JUL-2008 23:20:20 *  
2=(CONNECT_DATA=(SID=P1VPMHAM) 
3=(CID=(PROGRAM=) 
4=(HOST=__jdbc__) 
5=(USER=))) *

# 3  
Old 01-27-2009
an inelegant solution

Forget regular expressions. That isn't going to happen.
What you should probably do... is explain what you eventually want to do with
the variables. My initial questions are:
why awk?
why do they have to be in positions $1 through $8?
Once there, what do you want to do with them?
My point is -- the end result is what you're after -- hopefully -- not
whether we can put them in positions 1 through 8 for awk to do something with.
However, taking this nasty log file and converting it to your whims, like so:

cat << EOF |
11-JUL-2008 23:14:25 * (CONNECT_DATA=(SERVICE_NAME=WUMMER.IM.HERE.EXELLENT.COM)(CID=(PROGRAM=D:\oracle\product\10.2.0\clien t_1\jdk\jre\bin\java.exe)(HOST=X900005199)(USER=FTET1))) * (ADDRESS=(PROTOCOL=tcp)(HOST=45.137.251.223)(PORT=2196)) * establish * WUMMER.IM.HERE.EXELLENT.COM * 0
11-JUL-2008 23:20:20 * (CONNECT_DATA=(SID=P1VPMHAM)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))) * (ADDRESS=(PROTOCOL=tcp)(HOST=133.52.24.148)(PORT=1462)) * establish * WUMMER * 0
EOF
###---------------------------------------
### retain space for date, removed later on
###---------------------------------------
sed -e 's/ /@/' \
-e 's/)/) /g' \
|
###---------------------------------------
### convert all spaces to newlines
###---------------------------------------
tr ' ' '\012' |
###---------------------------------------
### delete blank lines, asterisk only lines and parenthise only lines
###---------------------------------------
sed -e '/^$/d' \
-e '/^\*/d' \
-e '/^)$/d' \
|
###---------------------------------------
### some line numbering...
###---------------------------------------
nl -nln |
###---------------------------------------
### grab only the 1-8 "fields"
###---------------------------------------
grep '^[1-8] ' |
###---------------------------------------
### convert to one line
###---------------------------------------
while read num line; do
print -n "$line "
if [ $num -eq 8 ]; then
print
fi
done |
###---------------------------------------
### and there they are... in positions 1-8
###---------------------------------------
awk 'BEGIN{ OFS="|"; }
{ print( $1, $2, $3, $4, $5, $6, $7, $8 ); }' |
###---------------------------------------
### oh. and remove the at sign for the date.
###---------------------------------------
sed -e 's/@/ /'


It's a complex mess, indeed.

Last edited by quirkasaurus; 01-27-2009 at 11:52 AM..
# 4  
Old 01-27-2009
Quote:
Originally Posted by joeyg
What I did was replace any ( with ~( so I could use the ~ as a delimiter.
Thanks a lot User joeyg for your solution, now I can further remove what I'm not wanting on the lines.

brgds from User sdohn
# 5  
Old 01-27-2009
i like the tilde solution, too. even better!

but figured i'd post mine anyways -- hopefully some of the ideas are valuable...
# 6  
Old 01-27-2009
Quote:
Originally Posted by quirkasaurus
Forget regular expressions. That isn't going to happen.
What you should probably do... is explain what you eventually want to do with
the variables. My initial questions are:
why awk?
why do they have to be in positions $1 through $8?
Once there, what do you want to do with them?
My point is -- the end result is what you're after -- hopefully -- not
whether we can put them in positions 1 through 8 for awk to do something with.
However, taking this nasty log file and converting it to your whims, like so:
Thank you for your solution to this complex problem.
The reason for me was to seperate the Values for putting them in a database. Now I can do a report with sql with the data.

brgds from user sdohn
# 7  
Old 01-27-2009
Cool. Then the script is useful. It converts everything to a pipe-delimited output.
Just load from there.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to filter file based on seperate conditions

The below awk will filter a list of 30,000 lines in the tab-delimited file. What I am having trouble with is adding a condition to SVTYPE=CNV that will only print that line if CI= must be >.05 . The other condition to add is if SVTYPE=Fusion, then in order to print that line READ_COUNT must... (3 Replies)
Discussion started by: cmccabe
3 Replies

2. Shell Programming and Scripting

awk sort based on difference of fields and print all fields

Hi I have a file as below <field1> <field2> <field3> ... <field_num1> <field_num2> Trying to sort based on difference of <field_num1> and <field_num2> in desceding order and print all fields. I tried this and it doesn't sort on the difference field .. Appreciate your help. cat... (9 Replies)
Discussion started by: newstart
9 Replies

3. Shell Programming and Scripting

awk - compare 1st 15 fields of record with 20 fields

I'm trying to compare 2 files for differences in a selct number of fields. When differnces are found it will write the whole record of the second file including appending '|C' out to a delta file. Each record will have 20 fields, but only want to do comparison of 1st 15 fields. The 1st field of... (7 Replies)
Discussion started by: sljnk
7 Replies

4. Shell Programming and Scripting

How to print 1st field and last 2 fields together and the rest of the fields after it using awk?

Hi experts, I need to print the first field first then last two fields should come next and then i need to print rest of the fields. Input : a1,abc,jsd,fhf,fkk,b1,b2 a2,acb,dfg,ghj,b3,c4 a3,djf,wdjg,fkg,dff,ggk,d4,d5 Expected output: a1,b1,b2,abc,jsd,fhf,fkk... (6 Replies)
Discussion started by: 100bees
6 Replies

5. Shell Programming and Scripting

Join fields comparing 4 fields using awk

Hi All, I am looking for an awk script to do the following Join the fields together only if the first 4 fields are same. Can it be done with join function in awk?? a,b,c,d,8,,, a,b,c,d,,7,, a,b,c,d,,,9, a,b,p,e,8,,, a.b,p,e,,9,, a,b,p,z,,,,9 a,b,p,z,,8,, desired output: ... (1 Reply)
Discussion started by: aksijain
1 Replies

6. Shell Programming and Scripting

Awk and duplicate lines - little complicated

So I've got problem which continues on my previous one (from few months ago: unix.com/shell-programming-scripting/171764-delete-duplicate-lines-twist.html ). Good, proven, working solutions for that old problem are those: awk '{cur=$0; gsub(/]/, "", cur); if (!a++) print}'and awk... (2 Replies)
Discussion started by: shadowww
2 Replies

7. Shell Programming and Scripting

awk sed cut? to rearrange random number of fields into 3 fields

I'm working on formatting some attendance data to meet a vendors requirements to upload to their system. With some help on the forums here, I have the data close. But they've since changed what they want. The vendor wants me to submit three fields to them. Field 1 is the studentid field,... (4 Replies)
Discussion started by: axo959
4 Replies

8. UNIX for Dummies Questions & Answers

awk to seperate a string that has a dash

Hello I have this string XYZ-ABC DFT-ERT QWE-TYU I want to get the part after the dash. how to do that? thanks (2 Replies)
Discussion started by: melanie_pfefer
2 Replies

9. Shell Programming and Scripting

how to awk a data from seperate lines

Hi guys, i have a problem which im hoping you will be able to help me with. I have follwing output :- ------------------------------------------------------------------------------- NSTEP = 407000 TIME(PS) = 43059.000 TEMP(K) = 288.46 PRESS = 0.0 Etot = -2077.4322 ... (2 Replies)
Discussion started by: Mish_99
2 Replies

10. Shell Programming and Scripting

Sort complicated two fields

Hi experts, I am trying sort command with my data but still not getting the expected results. For example, I have 5 fields data here c,18:12:45,c,c,c d,12:34:34,d,d,d a,13:50:10,a,a,a b,13:50:50,b,b,b a,13:50:50,a,a,a b,14:10:01,b,b,b c,10:12:45,c,c,c I want to get ... (3 Replies)
Discussion started by: lalelle
3 Replies
Login or Register to Ask a Question