Parsing XML in awk : OFS does not work as expected


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Parsing XML in awk : OFS does not work as expected
# 1  
Old 12-31-2010
Parsing XML in awk : OFS does not work as expected

Hi,

I am trying to parse regular XML file where I have to reduce number of decimal points in some xml elements. I am using following AWK command to achive that :

#!/bin/ksh

EDITCMD='BEGIN { FS = "[\<\>]"; OFS=FS }
{
if ( $3 ~ "[0-9][0-9]*\\.[0-9][0-9]*" && length(substr($3,1+index($3,"."))) == 15 ) {
PRE=substr($3,1,index($3,".")-1);
POST=substr($3,1+index($3,"."),5);
$3 = PRE "." POST
}
{
print $0
}
}'
nawk "$EDITCMD" /path/file.xml


Problem is, that I can not make the OFS to be correctly print out in the lines where the transformation was applied. Output looks like this:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<Import xmlns:xsi="">
<INSTRUMENT>
<INSTRUMENT_CD>00036AAB1</INSTRUMENT_CD>
<BUNDLE_ID>48328</BUNDLE_ID>
<ACCRUAL_DT>5/8/2001</ACCRUAL_DT>
[<>]AMT_ISU[<>]125000000.00000[<>]/AMT_ISU[<>]
<ANNOUNCE_DT>5/1/2001</ANNOUNCE_DT>
<CD_INSTMT_TYPE>UNKNOWN</CD_INSTMT_TYPE>
<CHANGE_DT>5/7/2009 21:02:01.370</CHANGE_DT>
..
..


What am I doing wrong ? FS definition seems to be correct as the transformation is applied to the correct fields/strings, but why the OFS does not hold corresponding FS character when line is been printed out ? It did not help when I escaped, double escaped or did not escaped this characters in FS.

Thanks for your help,

Martin
# 2  
Old 12-31-2010
Try: sub($3,PRE"."POST) instead of $3 = PRE "." POST and then you can leave out OFS=FS
# 3  
Old 12-31-2010
Thanks Scrutinizer, your advise works fine.

However, I would be still interested how to properly use OFS when in FS is regular expression or group of characters and I do not want to change corresponding output separator , just need to access and touch some of the fields.

Any other ideas ?

Thanks & Regards
# 4  
Old 12-31-2010
contrary to FS, OFS does not contain regex, so IMO that would not be possible..
# 5  
Old 01-01-2011
As it does not look like you are validating tags, and that you are reducing any number with 15 significant digits, maybe sed be a "better" choice:
Code:
sed -e 's/>\([0-9][0-9]*\.[0-9][0-9][0-9][0-9][0-9]\)[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]</>\1</g' inputfile

(Yes, I know "-e" is not necessary, but I am one of those boring, make it obvious kind of person)
This way, you don't have to worry if file being changed was formated as shown above, or as:
Code:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<Import xmlns:xsi="">
<INSTRUMENT><INSTRUMENT_CD>00036AAB1</INSTRUMENT_CD><BUNDLE_ID>48328</BUNDLE_ID><ACCRUAL_DT>5/8/2001</ACCRUAL_DT><AMT_ISU>125000000.123456789012345</AMT_ISU><ANNOUNCE_DT>5/1/2001</ANNOUNCE_DT><CD_INSTMT_TYPE>UNKNOWN</CD_INSTMT_TYPE><CHANGE_DT>5/7/2009 21:02:01.370</CHANGE_DT>...

Now if you need make sure the tags match, you can do change the regex to:
Code:
s:<\([^>]*\)>\([0-9][0-9]*\.[0-9][0-9][0-9][0-9][0-9]\)[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]</\1>:<\1>\2</\1>:g

Or even list the specific tags you want to change:
Code:
s:<\(AMT_ISU\|anothertag\)>\([0-9][0-9]*\.[0-9][0-9][0-9][0-9][0-9]\)[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]</\1>:<\1>\2</\1>:g

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

XML Parsing using awk

Hi All, I have a problem to resolve. For following XML file, I need to parse the values based on Tag Name. I would prefer to use this by awk. I have used sed command to replace the tags (s/<SeqNo>//). In this case there can be new tags introduced. So need to parse it based on Tag Name. Any... (9 Replies)
Discussion started by: Tons
9 Replies

2. UNIX for Dummies Questions & Answers

OFS in awk

Hello, I have an issue with adding commas as delimiters in this scenario: cat xtr3.rpl|head -5|awk 'BEGIN {OFS=","} {print $1,$2,$3,$4}' Produces this output: 00530083,0000000471,000000000000.00,000000000000.00 00530085,0000000471,000000000000.00,000000000000.00... (10 Replies)
Discussion started by: MIA651
10 Replies

3. Shell Programming and Scripting

xml parsing with awk

hi all.. need your help again.. i have xml file and i want to parsing some data from the xml file.. <ex-name="keroco"> <................> <................> <................> <br-name="cincai"> <ship="123456"> <...................> ... (3 Replies)
Discussion started by: buncit8
3 Replies

4. Shell Programming and Scripting

Help needed for parsing large XML with awk.

My XML structure looks like: <?xml version="1.0" encoding="UTF-8"?> <SearchRepository> <SearchItems> <SearchItem> ... </SearchItem> <SearchItem> ... ... (1 Reply)
Discussion started by: jasonjustice
1 Replies

5. Shell Programming and Scripting

AWK - OFS

Hi All, I have a comma seperated delimited file with 10 columns. I need to convert it into TAB seperated delimited file. awk -F"," '{print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9"\t"$10}' a.txt >> b.txt how to use OFS to get the same output. I have tried by googling, but it... (5 Replies)
Discussion started by: Amit.Sagpariya
5 Replies

6. Shell Programming and Scripting

parsing(xml) using nawk/awk

Hi , I have an xml format as shown below: <Info> <last name="sean" first name="john"/> <period="5" time="11"/> <test value="1",test2 value="2",test3 value="3",test4 value="5"> <old> <value1>1</value1> <value2>2</value2> </old> <new> <value1>4</value1> <value2>3</value2> </new>... (1 Reply)
Discussion started by: natalie23
1 Replies

7. Shell Programming and Scripting

parsing xml using awk

hello , i am trying to parse xml using awk however its a little bit tricky as i want <databases> <source> <host>prod</host> <port>1522</port> <tns>GP1</tns> <user>P11</user>... (6 Replies)
Discussion started by: amit1_x
6 Replies

8. Shell Programming and Scripting

Parsing xml using awk - more help needed

As per another thread - https://www.unix.com/shell-programming-scripting/81027-how-can-i-parse-xml-file-2.html I am using the following to extract the Subaccid and RecAccTotal from the xm file below awk -v v=SubaccId -F'' '$2==v{s=$3;getline;a+=$3}END {for (i in a)print v,i,a}' file Can... (6 Replies)
Discussion started by: frustrated1
6 Replies

9. Shell Programming and Scripting

parsing xml with awk/sed

Hi people!, I need extract from the file (test-file.txt) the values between <context> and </context> tag's , the total are 7 lines,but i can only get 5 or 2 lines!!:confused: Please look my code: #awk '/context/{flag=1} /\/context/{flag=0} !/context/{ if (flag==1) p rint $0; }'... (3 Replies)
Discussion started by: ricgamch
3 Replies

10. UNIX for Dummies Questions & Answers

Parsing XML dynamic data via awk?

I am trying to use a line of output in an XML file as input in another new XML file for processing purposes via a shell script. Since I am a newbie though, I'm not sure how to do this since the data is different everytime. I am using this technique with static data right now: echo -n "Running... (5 Replies)
Discussion started by: corwin43
5 Replies
Login or Register to Ask a Question