How to do find differences between 2 XML Files?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to do find differences between 2 XML Files?
# 8  
Old 09-09-2014
Quote:
Originally Posted by Don Cragun
RudiC's suggestion may work well on a Linux system, but sed is only defined to work on a text file. (Files with lines that average 9Mb/line and that do not end with a newline character are not text files.) The following should work as long as no line in your desired output file is longer than 2048 bytes:
Code:
awk '
BEGIN {	RS = ">"
	ORS = ">\n"
}
!/^</ {	out = out ">" $1
	next
}
{	print out
	out = $1
}
END {	print out
}' 2014-03-31_17_V2.5.XML.utf8

i tried your code, but it is removing the content for example
<?xml version="1.0" encoding="UTF-16" ?>
<Provider PROVIDER="17" SCHEMA_VERSION="2.5">
CIF="14338">
xxxxxxxxxxxx DAIRY FARMS INC</BORROWER_NAME>

Code:
>
<?xml>
<Provider>
<Institution>
<Customer>
<BORROWER_NAME>xxxxxxxxx>
<FIPS_CODE>49023</FIPS_CODE>
<RELATED_PARTY_LOAN_CODE>0</RELATED_PARTY_LOAN_CODE>
<RELATIONSHIP_ESTABLISH_DATE>1999-03-15</RELATIONSHIP_ESTABLISH_DATE>
<LAST_RISK_RATING_CHANGE_DATE>2012-06-12</LAST_RISK_RATING_CHANGE_DATE>
<BALANCE_SHEET_DATE>2012-11-30</BALANCE_SHEET_DATE>
<INCOME_STATEMENT_DATE>2013-12-31</INCOME_STATEMENT_DATE>
<DEBT_REPAYMENT_COVERAGE_RATIO>1.2083000000</DEBT_REPAYMENT_COVERAGE_RATIO>
<CURRENT_ASSETS>2216074.00</CURRENT_ASSETS>
<CURRENT_LIABILITIES>2036364.00</CURRENT_LIABILITIES>
<FARM_OPS_EXP>3759280.00</FARM_OPS_EXP>
<GROSS_AG_INC>3553127.00</GROSS_AG_INC>
<INT_EXP>123050.00</INT_EXP>
<NON_CURR_ASSET>14172350.00</NON_CURR_ASSET>
<NON_CURR_LIABILITIES>2491022.00</NON_CURR_LIABILITIES>
<NET_AG_INC>-206153.00</NET_AG_INC>
<NET_INC>498043.00</NET_INC>
<NET_WORTH>11861038.00</NET_WORTH>
<NONFARM_INC>704196.00</NONFARM_INC>
<TOTAL_ASSETS>16388424.00</TOTAL_ASSETS>
<TOTAL_LIABILITIES>4527386.00</TOTAL_LIABILITIES>
<DEBT_SERVICE_REQUIREMENT>514028.00</DEBT_SERVICE_REQUIREMENT>
<REPAYMENT_SOURCE>1</REPAYMENT_SOURCE>

# 9  
Old 09-09-2014
What input is giving you that output?
# 10  
Old 09-09-2014
Quote:
Originally Posted by Don Cragun
What input is giving you that output?
Below is the excerpt of my input file, its one single file because of no EOL characters as you know.

Code:
<?xml version="1.0" encoding="UTF-16" ?><Provider PROVIDER="x" SCHEMA_VERSION="2.5"><Institution UNINUM="xxxx" EXTRACT_DATE="2013-12-31" CUSTOMER_ROW_COUNT="1577" LOAN_ROW_COUNT="3322" BOOK_VALUE_DOLLARS="720163381.46" BOOK_VALUE_COUNT="3115" PAST_DUE_AMOUNT_DOLLARS="3630254.00" PAST_DUE_AMOUNT_COUNT="23" ACCEPTABLE_VOL_DOLLARS="693647325.79" ACCEPTABLE_VOL_COUNT="3058" ACCRUED_INTEREST_DOLLARS="10221888.55" ACCRUED_INTEREST_COUNT="2877" DOUBTFUL_VOL_DOLLARS="2374.32" DOUBTFUL_VOL_COUNT="3" OAEM_VOL_DOLLARS="17835860.04" OAEM_VOL_COUNT="16" PRINCIPAL_BALANCE_DOLLARS="709941492.91" PRINCIPAL_BALANCE_COUNT="3095" SUBSTANDARD_VOL_DOLLARS="8677821.31" SUBSTANDARD_VOL_COUNT="38" PD_RATING_VALUES="20603" PD_RATING_COUNT="3322" BEGINNING_FARMER_FLAG_COUNT="636" SMALL_FARMER_FLAG_COUNT="1505" YOUNG_FARMER_FLAG_COUNT="580"><Customer CIF="14338"><BORROWER_NAME>xxxxxxxxx DAIRY FARMS INC</BORROWER_NAME><FIPS_CODE>49023</FIPS_CODE><RELATED_PARTY_LOAN_CODE>0</RELATED_PARTY_LOAN_CODE><RELATIONSHIP_ESTABLISH_DATE>1999-03-15</RELATIONSHIP_ESTABLISH_DATE><LAST_RISK_RATING_CHANGE_DATE>2012-06-12</LAST_RISK_RATING_CHANGE_DATE><BALANCE_SHEET_DATE>2012-11-30</BALANCE_SHEET_DATE><INCOME_STATEMENT_DATE>2013-12-31</INCOME_STATEMENT_DATE><DEBT_REPAYMENT_COVERAGE_RATIO>1.2083000000</DEBT_REPAYMENT_COVERAGE_RATIO><CURRENT_ASSETS>2216074.00</CURRENT_ASSETS><CURRENT_LIABILITIES>2036364.00</CURRENT_LIABILITIES><FARM_OPS_EXP>3759280.00</FARM_OPS_EXP><GROSS_AG_INC>3553127.00</GROSS_AG_INC><INT_EXP>123050.00</INT_EXP><NON_CURR_ASSET>14172350.00</NON_CURR_ASSET><NON_CURR_LIABILITIES>2491022.00</NON_CURR_LIABILITIES><NET_AG_INC>-206153.00</NET_AG_INC><NET_INC>498043.00</NET_INC><NET_WORTH>11861038.00</NET_WORTH><NONFARM_INC>704196.00</NONFARM_INC><TOTAL_ASSETS>16388424.00</TOTAL_ASSETS><TOTAL_LIABILITIES>4527386.00</TOTAL_LIABILITIES><DEBT_SERVICE_REQUIREMENT>514028.00</DEBT_SERVICE_REQUIREMENT><REPAYMENT_SOURCE>1</REPAYMENT_SOURCE><Loan LOAN_NUMBER="3583040101"><BRANCH>RICHFIELD

# 11  
Old 09-09-2014
Sorry about that, try this:
Code:
awk '
BEGIN {	FS = RS = ">"
	ORS = ">\n"
}
!/^</ {	out = out ">" $1
	next
}
{	print out
	out = $1
}
END {	print out
}' 2014-03-31_17_V2.5.XML.utf8

This User Gave Thanks to Don Cragun For This Post:
# 12  
Old 09-10-2014
Quote:
Originally Posted by Don Cragun
Sorry about that, try this:
Code:
awk '
BEGIN {	FS = RS = ">"
	ORS = ">\n"
}
!/^</ {	out = out ">" $1
	next
}
{	print out
	out = $1
}
END {	print out
}' 2014-03-31_17_V2.5.XML.utf8

It looks good now below is the excerpt from XML file, Thanks a lot


Code:
>
<?xml version="1.0" encoding="UTF-16" ?>
<Provider PROVIDER="xx" SCHEMA_VERSION="2.5">
<Institution UNINUM="xxxxxx" EXTRACT_DATE="2013-12-31" CUSTOMER_ROW_COUNT="1577" LOAN_ROW_COUNT="3322" BOOK_VALUE_DOLLARS="720163381.46" BOOK_VALUE_COUNT="3115" PAST_DUE_AMOUNT_DOLLARS="3630254.00" PAST_DUE_AMOUNT_COUNT="23" ACCEPTABLE_VOL_DOLLARS="693647325.79" ACCEPTABLE_VOL_COUNT="3058" ACCRUED_INTEREST_DOLLARS="10221888.55" ACCRUED_INTEREST_COUNT="2877" DOUBTFUL_VOL_DOLLARS="2374.32" DOUBTFUL_VOL_COUNT="3" OAEM_VOL_DOLLARS="17835860.04" OAEM_VOL_COUNT="16" PRINCIPAL_BALANCE_DOLLARS="709941492.91" PRINCIPAL_BALANCE_COUNT="3095" SUBSTANDARD_VOL_DOLLARS="8677821.31" SUBSTANDARD_VOL_COUNT="38" PD_RATING_VALUES="20603" PD_RATING_COUNT="3322" BEGINNING_FARMER_FLAG_COUNT="636" SMALL_FARMER_FLAG_COUNT="1505" YOUNG_FARMER_FLAG_COUNT="580">
<Customer CIF="xxx">
<BORROWER_NAME>xxxxx</BORROWER_NAME>
<FIPS_CODE>49023</FIPS_CODE>
<RELATED_PARTY_LOAN_CODE>0</RELATED_PARTY_LOAN_CODE>
<RELATIONSHIP_ESTABLISH_DATE>1999-03-15</RELATIONSHIP_ESTABLISH_DATE>
<LAST_RISK_RATING_CHANGE_DATE>2012-06-12</LAST_RISK_RATING_CHANGE_DATE>
<BALANCE_SHEET_DATE>2012-11-30</BALANCE_SHEET_DATE>
<INCOME_STATEMENT_DATE>2013-12-31</INCOME_STATEMENT_DATE>
<DEBT_REPAYMENT_COVERAGE_RATIO>1.2083000000</DEBT_REPAYMENT_COVERAGE_RATIO>
<CURRENT_ASSETS>2216074.00</CURRENT_ASSETS>
<CURRENT_LIABILITIES>2036364.00</CURRENT_LIABILITIES>
<FARM_OPS_EXP>3759280.00</FARM_OPS_EXP>
<GROSS_AG_INC>3553127.00</GROSS_AG_INC>
<INT_EXP>123050.00</INT_EXP>
<NON_CURR_ASSET>14172350.00</NON_CURR_ASSET>
<NON_CURR_LIABILITIES>2491022.00</NON_CURR_LIABILITIES>
<NET_AG_INC>-206153.00</NET_AG_INC>
<NET_INC>498043.00</NET_INC>
<NET_WORTH>11861038.00</NET_WORTH>
<NONFARM_INC>704196.00</NONFARM_INC>
<TOTAL_ASSETS>16388424.00</TOTAL_ASSETS>
<TOTAL_LIABILITIES>4527386.00</TOTAL_LIABILITIES>
<DEBT_SERVICE_REQUIREMENT>514028.00</DEBT_SERVICE_REQUIREMENT>
<REPAYMENT_SOURCE>1</REPAYMENT_SOURCE>
<Loan LOAN_NUMBER="3583040101">
<BRANCH>RICHFIELD                                         </BRANCH>
<INT_RATE_PRODUCT>6</INT_RATE_PRODUCT>
<LOAN_OFFICER>ROBERT WHEELER                                              </LOAN_OFFICER>
<YOUNG_FARMER_FLAG>0</YOUNG_FARMER_FLAG>
<LOSS_GIVEN_DEFAULT>A</LOSS_GIVEN_DEFAULT>
<TIL_FLAG>0</TIL_FLAG>
<PERFORMANCE_CLASS>1</PERFORMANCE_CLASS>
<BEGINNING_FARMER_FLAG>0</BEGINNING_FARMER_FLAG>
<LOAN_TYPE>1</LOAN_TYPE>
<SMALL_FARMER_FLAG>0</SMALL_FARMER_FLAG>
<ACCEPTABLE_VOL>836460.01</ACCEPTABLE_VOL>
<ACCRUED_INTEREST>0.00</ACCRUED_INTEREST>
<BOOK_VALUE>836460.01</BOOK_VALUE>
<BORROWER_CATEGORY>2</BORROWER_CATEGORY>
<BORROWER_ENTITY>3</BORROWER_ENTITY>
<COMMIT_CURRENT>836460.01</COMMIT_CURRENT>
<COMMIT_UNDISBURSED>0.00</COMMIT_UNDISBURSED>
<COST_OF_FUNDS>0.0150500000</COST_OF_FUNDS>
<DATE_ORIGINATED>2000-07-21</DATE_ORIGINATED>
<DOUBTFUL_VOL>0.00</DOUBTFUL_VOL>
<GOVT_GUARANTEE_AMT>0.00</GOVT_GUARANTEE_AMT>
<INT_RATE>0.0450000000</INT_RATE>
<LENDER>1</LENDER>

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to find differences between two file

I am trying to find the differences between the two sorted, tab separated, attached files. Thank you :). In update2 there are 52,058 lines and in current2 there are 52,197 so 139 differences should result. However, awk 'FNR==NR{a;next}!($0 in a)' update2 current2 > out2comm -1 -3... (2 Replies)
Discussion started by: cmccabe
2 Replies

2. Shell Programming and Scripting

Splitting a single xml file into multiple xml files

Hi, I'm having a xml file with multiple xml header. so i want to split the file into multiple files. Sample.xml consists multiple headers so how can we split these multiple headers into multiple files in unix. eg : <?xml version="1.0" encoding="UTF-8"?> <ml:individual... (3 Replies)
Discussion started by: Narendra921631
3 Replies

3. Shell Programming and Scripting

Splitting xml file into several xml files using perl

Hi Everyone, I'm new here and I was checking this old post: /shell-programming-and-scripting/180669-splitting-file-into-several-smaller-files-using-perl.html (cannot paste link because of lack of points) I need to do something like this but understand very little of perl. I also check... (4 Replies)
Discussion started by: mcosta
4 Replies

4. Shell Programming and Scripting

Extract strings from XML files and create a new XML

Hello everybody, I have a double mission with some XML files, which is pretty challenging for my actual beginner UNIX knowledge. I need to extract some strings from multiple XML files and create a new XML file with the searched strings.. The original XML files contain the source code for... (12 Replies)
Discussion started by: milano.churchil
12 Replies

5. Shell Programming and Scripting

Comparing 2 xml files and print the differences only in output

Hi....I'm having 2 xml files, one is having some special characters and another is a clean xml file does not have any special characters. Now I need one audit kind of file which will show me only from which line the special characters have been removed and the special characters. Can you please... (1 Reply)
Discussion started by: Krishanu Saha
1 Replies

6. Shell Programming and Scripting

find un-closed tags in XML files

Hi All, I am trying to validate XMLs from a folder: Input Directory having multiple XML files: File1.xml <Root> <Parent> <Child Name="One"> <Foo>...</Foo> <Bar>...</Bar> <Baz>...</Baz> </Child> <Child Name="Two"> <Foo>...</Foo>... (3 Replies)
Discussion started by: unme
3 Replies

7. Shell Programming and Scripting

Read column and find differences...

I have this file 427 A C A/C 12 436 G C G/C 12 445 C T C/T 12 447 A G A/G 9 451 T C T/C 5 456 A G A/G 12 493 G A G/A 12 I wanted to read the first column and find all other ids which are differences less than 10. 427 A C A/C 12 436 436 G C G/C 12 427,445... (7 Replies)
Discussion started by: empyrean
7 Replies

8. Shell Programming and Scripting

Find required files by pattern in xml files and the change the pattern on Linux

Hello, I need to find all *.xml files that matched by pattern on Linux. I need to have written the file name on the screen and then change the pattern in the file just was found. For instance. I can start the script with arguments for keyword and for value, i.e script.sh keyword... (1 Reply)
Discussion started by: yart
1 Replies

9. Shell Programming and Scripting

Differences between 2 Flat Files and process the differences

Hi Hope you are having a great weeknd !! I had a question and need your expertise for this : I have 2 files File1 & File2(of same structure) which I need to compare on some columns. I need to find the values which are there in File2 but not in File 1 and put the Differences in another file... (5 Replies)
Discussion started by: newbie_8398
5 Replies

10. HP-UX

Compare 2 systems to find any differences

Hi there, I have 2 machines running HP-UX. One off these controllers is able to send mail and the other cannot. I have looked at all the settings that I know and coannot find any differences. Is there a way to audit the 2 machinces by pulling all the settings then compare any differences? ... (2 Replies)
Discussion started by: lodey
2 Replies
Login or Register to Ask a Question