How to do find differences between 2 XML Files?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to do find differences between 2 XML Files?
# 1  
Old 09-08-2014
How to do find differences between 2 XML Files?

Hello All,

Requirement is to compare 2 XML files and see if there are any differences but from some of the providers We are receiving UTF-16 formatted XML file with no end of line as shown below.

Excerpt of data file:
Code:
ÿþ<^@?^@x^@m^@l^@ ^@v^@e^@r^@s^@i^@o^@n^@=^@"^@1^@.^@0^@"^@ ^@e^@n^@c^@o^@d^@i^@n^@g^@=^@"^@U^@T^@F^@-^@1^@6^@"^@ ^@?^@>^@<^@P^@r^@o^@v^@i^@d^@e^@r^@ ^@x^^@l^@n^@s^@=^@"^@h^@t^@t^@p^@:^@/^@/^@w^@w^@w^@.^@f^@c^@a^@.^@g^@o^@v^@/^@F^@C^@S^@L^@o^@a^@n^@s^@"^@ ^@x^@m^@l^@n^@s^@:^@x^@s^@i^@=^@"^@h^@t^@t^@p^@:^@/^@/^@w^@w^@w^@.^@w^@3^@.^@o^@r^@g^@/^@2^@0^@0^@1^@/^@X^@M^@L^@S^@c^@h^@e^@m^@a^@-^@i^@n^@s^@t^@a^@n^@c^@e^@"^@

2014-03-31_17_V2.5.XML [readonly][noeol][converted] 2L, 18676154C


I used iconv command to convert this file to UTF-8 formatted file.
Now i can see the data in the XML file visible to human eyes but everything is coming out as a single line.

Code:
wc -l 2014-03-31_17_V2.5.XML.utf8
	1 2014-03-31_17_V2.5.XML.utf8

How could i put of end of lines after each XML tag?

once i align the XML tags in my data file with end of line characters, then i want to do DIFF between two XML files to find the differences. please help.


Thank you.
# 2  
Old 09-08-2014
If you can live with a newline added after every tag, here is a simple way to do it:
Code:
awk '
BEGIN {	RS = ">"
	ORS = ">\n"
}
{	$1 = $1
	print
}' file.xml

This will add a newline after each greater than sign (which will also add the missing newline to the end of your file).

Change awk to /usr/xpg4/bin/awk if you're using a Solaris/SunOS system.
# 3  
Old 09-08-2014
Quote:
Originally Posted by Don Cragun
If you can live with a newline added after every tag, here is a simple way to do it:
Code:
awk '
BEGIN {	RS = ">"
	ORS = ">\n"
}
{	$1 = $1
	print
}' file.xml

This will add a newline after each greater than sign (which will also add the missing newline to the end of your file).

Change awk to /usr/xpg4/bin/awk if you're using a Solaris/SunOS system.
Sorry i should have mentioned my OS i am using Red Hat Linux OS. I tried executing your code but receiving an error message.

Code:
awk 'BEGIN {RS = ">" ORS = ">\n"}{$1 = $1 print}' 2014-03-31_17_V2.5.XML.utf8

awk: BEGIN {RS = ">" ORS = ">\n"}{$1 = $1 print}
awk:                     ^ syntax error
awk: BEGIN {RS = ">" ORS = ">\n"}{$1 = $1 print}
awk:                                      ^ syntax error

# 4  
Old 09-08-2014
Quote:
Originally Posted by Ariean
Sorry i should have mentioned my OS i am using Red Hat Linux OS. I tried executing your code but receiving an error message.

Code:
awk 'BEGIN {RS = ">" ORS = ">\n"}{$1 = $1 print}' 2014-03-31_17_V2.5.XML.utf8

awk: BEGIN {RS = ">" ORS = ">\n"}{$1 = $1 print}
awk:                     ^ syntax error
awk: BEGIN {RS = ">" ORS = ">\n"}{$1 = $1 print}
awk:                                      ^ syntax error

If you would use the code I gave you; there would not be any syntax errors.
If you randomly combine lines of code without adding statement separators, when you do that; you should expect syntax errors, or just bad results! Please try the code I suggested (keeping newlines where I had them) and let us know what happens.
# 5  
Old 09-08-2014
Below is the excerpt of the output for the awk script.

Code:
<?xml version="1.0" encoding="UTF-16" ?>
<Provider PROVIDER="17" SCHEMA_VERSION="2.5">
<Institution UNINUM="xxxx" EXTRACT_DATE="2013-12-31" CUSTOMER_ROW_COUNT="1577" LOAN_ROW_COUNT="3322" BOOK_VALUE_DOLLARS="720163381.46" BOOK_VALUE_COUNT="3115" PAST_DUE_AMOUNT_DOLLARS="3630254.00" PAST_DUE_AMOUNT_COUNT="23" ACCEPTABLE_VOL_DOLLARS="693647325.79" ACCEPTABLE_VOL_COUNT="3058" ACCRUED_INTEREST_DOLLARS="10221888.55" ACCRUED_INTEREST_COUNT="2877" DOUBTFUL_VOL_DOLLARS="2374.32" DOUBTFUL_VOL_COUNT="3" OAEM_VOL_DOLLARS="17835860.04" OAEM_VOL_COUNT="16" PRINCIPAL_BALANCE_DOLLARS="709941492.91" PRINCIPAL_BALANCE_COUNT="3095" SUBSTANDARD_VOL_DOLLARS="8677821.31" SUBSTANDARD_VOL_COUNT="38" PD_RATING_VALUES="20603" PD_RATING_COUNT="3322" BEGINNING_FARMER_FLAG_COUNT="636" SMALL_FARMER_FLAG_COUNT="1505" YOUNG_FARMER_FLAG_COUNT="580">
<Customer CIF="14338">
<BORROWER_NAME>
xxxxxxxxxxxx</BORROWER_NAME>
<FIPS_CODE>
49023</FIPS_CODE>
<RELATED_PARTY_LOAN_CODE>
0</RELATED_PARTY_LOAN_CODE>
<RELATIONSHIP_ESTABLISH_DATE>
1999-03-15</RELATIONSHIP_ESTABLISH_DATE>
<LAST_RISK_RATING_CHANGE_DATE>
2012-06-12</LAST_RISK_RATING_CHANGE_DATE>
<BALANCE_SHEET_DATE>
2012-11-30</BALANCE_SHEET_DATE>
<INCOME_STATEMENT_DATE>
2013-12-31</INCOME_STATEMENT_DATE>
<DEBT_REPAYMENT_COVERAGE_RATIO>
1.2083000000</DEBT_REPAYMENT_COVERAGE_RATIO>
<CURRENT_ASSETS>
2216074.00</CURRENT_ASSETS>
<CURRENT_LIABILITIES>
2036364.00</CURRENT_LIABILITIES>
<FARM_OPS_EXP>
3759280.00</FARM_OPS_EXP>
<GROSS_AG_INC>
3553127.00</GROSS_AG_INC>
<INT_EXP>
123050.00</INT_EXP>
<NON_CURR_ASSET>
14172350.00</NON_CURR_ASSET>
<NON_CURR_LIABILITIES>
2491022.00</NON_CURR_LIABILITIES>
<NET_AG_INC>
-206153.00</NET_AG_INC>
<NET_INC>
498043.00</NET_INC>
<NET_WORTH>
11861038.00</NET_WORTH>
<NONFARM_INC>
704196.00</NONFARM_INC>
<TOTAL_ASSETS>
16388424.00</TOTAL_ASSETS>
<TOTAL_LIABILITIES>
4527386.00</TOTAL_LIABILITIES>
<DEBT_SERVICE_REQUIREMENT>
514028.00</DEBT_SERVICE_REQUIREMENT>
<REPAYMENT_SOURCE>
1</REPAYMENT_SOURCE>
</Customer>
</Institution>
</Provider>
^M>

Is there any possibility to acheive the below expected output

Code:
<?xml version="1.0" encoding="UTF-16" ?>
<Provider PROVIDER="17" SCHEMA_VERSION="2.5">
<Institution UNINUM="xxxx" EXTRACT_DATE="2013-12-31" CUSTOMER_ROW_COUNT="1577" LOAN_ROW_COUNT="3322" BOOK_VALUE_DOLLARS="720163381.46" BOOK_VALUE_COUNT="3115" PAST_DUE_AMOUNT_DOLLARS="3630254.00" PAST_DUE_AMOUNT_COUNT="23" ACCEPTABLE_VOL_DOLLARS="693647325.79" ACCEPTABLE_VOL_COUNT="3058" ACCRUED_INTEREST_DOLLARS="10221888.55" ACCRUED_INTEREST_COUNT="2877" DOUBTFUL_VOL_DOLLARS="2374.32" DOUBTFUL_VOL_COUNT="3" OAEM_VOL_DOLLARS="17835860.04" OAEM_VOL_COUNT="16" PRINCIPAL_BALANCE_DOLLARS="709941492.91" PRINCIPAL_BALANCE_COUNT="3095" SUBSTANDARD_VOL_DOLLARS="8677821.31" SUBSTANDARD_VOL_COUNT="38" PD_RATING_VALUES="20603" PD_RATING_COUNT="3322" BEGINNING_FARMER_FLAG_COUNT="636" SMALL_FARMER_FLAG_COUNT="1505" YOUNG_FARMER_FLAG_COUNT="580">
<Customer CIF="14338">
<BORROWER_NAME>xxxxxxxxxxxx</BORROWER_NAME>
<FIPS_CODE>49023</FIPS_CODE>
<RELATED_PARTY_LOAN_CODE>0</RELATED_PARTY_LOAN_CODE>
<RELATIONSHIP_ESTABLISH_DATE>1999-03-15</RELATIONSHIP_ESTABLISH_DATE>
<LAST_RISK_RATING_CHANGE_DATE>2012-06-12</LAST_RISK_RATING_CHANGE_DATE>
<BALANCE_SHEET_DATE>2012-11-30</BALANCE_SHEET_DATE>
<INCOME_STATEMENT_DATE>2013-12-31</INCOME_STATEMENT_DATE>
<DEBT_REPAYMENT_COVERAGE_RATIO>1.2083000000</DEBT_REPAYMENT_COVERAGE_RATIO>
<CURRENT_ASSETS>2216074.00</CURRENT_ASSETS>
<CURRENT_LIABILITIES>2036364.00</CURRENT_LIABILITIES>
<FARM_OPS_EXP>3759280.00</FARM_OPS_EXP>
<GROSS_AG_INC>3553127.00</GROSS_AG_INC>
<INT_EXP>123050.00</INT_EXP>
<NON_CURR_ASSET>14172350.00</NON_CURR_ASSET>
<NON_CURR_LIABILITIES>2491022.00</NON_CURR_LIABILITIES>
<NET_AG_INC>-206153.00</NET_AG_INC>
<NET_INC>498043.00</NET_INC>
<NET_WORTH>11861038.00</NET_WORTH>
<NONFARM_INC>704196.00</NONFARM_INC>
<TOTAL_ASSETS>16388424.00</TOTAL_ASSETS>
<TOTAL_LIABILITIES>4527386.00</TOTAL_LIABILITIES>
<DEBT_SERVICE_REQUIREMENT>514028.00</DEBT_SERVICE_REQUIREMENT>
<REPAYMENT_SOURCE>1</REPAYMENT_SOURCE>
</Customer>
</Institution>
</Provider>

# 6  
Old 09-08-2014
Try this on your one line file, but your mileage may vary:
Code:
sed -r 's#(</[^>]*>)#\1\n#g' file

# 7  
Old 09-08-2014
RudiC's suggestion may work well on a Linux system, but sed is only defined to work on a text file. (Files with lines that average 9Mb/line and that do not end with a newline character are not text files.) The following should work as long as no line in your desired output file is longer than 2048 bytes:
Code:
awk '
BEGIN {	RS = ">"
	ORS = ">\n"
}
!/^</ {	out = out ">" $1
	next
}
{	print out
	out = $1
}
END {	print out
}' 2014-03-31_17_V2.5.XML.utf8

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to find differences between two file

I am trying to find the differences between the two sorted, tab separated, attached files. Thank you :). In update2 there are 52,058 lines and in current2 there are 52,197 so 139 differences should result. However, awk 'FNR==NR{a;next}!($0 in a)' update2 current2 > out2comm -1 -3... (2 Replies)
Discussion started by: cmccabe
2 Replies

2. Shell Programming and Scripting

Splitting a single xml file into multiple xml files

Hi, I'm having a xml file with multiple xml header. so i want to split the file into multiple files. Sample.xml consists multiple headers so how can we split these multiple headers into multiple files in unix. eg : <?xml version="1.0" encoding="UTF-8"?> <ml:individual... (3 Replies)
Discussion started by: Narendra921631
3 Replies

3. Shell Programming and Scripting

Splitting xml file into several xml files using perl

Hi Everyone, I'm new here and I was checking this old post: /shell-programming-and-scripting/180669-splitting-file-into-several-smaller-files-using-perl.html (cannot paste link because of lack of points) I need to do something like this but understand very little of perl. I also check... (4 Replies)
Discussion started by: mcosta
4 Replies

4. Shell Programming and Scripting

Extract strings from XML files and create a new XML

Hello everybody, I have a double mission with some XML files, which is pretty challenging for my actual beginner UNIX knowledge. I need to extract some strings from multiple XML files and create a new XML file with the searched strings.. The original XML files contain the source code for... (12 Replies)
Discussion started by: milano.churchil
12 Replies

5. Shell Programming and Scripting

Comparing 2 xml files and print the differences only in output

Hi....I'm having 2 xml files, one is having some special characters and another is a clean xml file does not have any special characters. Now I need one audit kind of file which will show me only from which line the special characters have been removed and the special characters. Can you please... (1 Reply)
Discussion started by: Krishanu Saha
1 Replies

6. Shell Programming and Scripting

find un-closed tags in XML files

Hi All, I am trying to validate XMLs from a folder: Input Directory having multiple XML files: File1.xml <Root> <Parent> <Child Name="One"> <Foo>...</Foo> <Bar>...</Bar> <Baz>...</Baz> </Child> <Child Name="Two"> <Foo>...</Foo>... (3 Replies)
Discussion started by: unme
3 Replies

7. Shell Programming and Scripting

Read column and find differences...

I have this file 427 A C A/C 12 436 G C G/C 12 445 C T C/T 12 447 A G A/G 9 451 T C T/C 5 456 A G A/G 12 493 G A G/A 12 I wanted to read the first column and find all other ids which are differences less than 10. 427 A C A/C 12 436 436 G C G/C 12 427,445... (7 Replies)
Discussion started by: empyrean
7 Replies

8. Shell Programming and Scripting

Find required files by pattern in xml files and the change the pattern on Linux

Hello, I need to find all *.xml files that matched by pattern on Linux. I need to have written the file name on the screen and then change the pattern in the file just was found. For instance. I can start the script with arguments for keyword and for value, i.e script.sh keyword... (1 Reply)
Discussion started by: yart
1 Replies

9. Shell Programming and Scripting

Differences between 2 Flat Files and process the differences

Hi Hope you are having a great weeknd !! I had a question and need your expertise for this : I have 2 files File1 & File2(of same structure) which I need to compare on some columns. I need to find the values which are there in File2 but not in File 1 and put the Differences in another file... (5 Replies)
Discussion started by: newbie_8398
5 Replies

10. HP-UX

Compare 2 systems to find any differences

Hi there, I have 2 machines running HP-UX. One off these controllers is able to send mail and the other cannot. I have looked at all the settings that I know and coannot find any differences. Is there a way to audit the 2 machinces by pulling all the settings then compare any differences? ... (2 Replies)
Discussion started by: lodey
2 Replies
Login or Register to Ask a Question