Requirement is to compare 2 XML files and see if there are any differences but from some of the providers We are receiving UTF-16 formatted XML file with no end of line as shown below.
Excerpt of data file: 2014-03-31_17_V2.5.XML [readonly][noeol][converted] 2L, 18676154C
I used iconv command to convert this file to UTF-8 formatted file.
Now i can see the data in the XML file visible to human eyes but everything is coming out as a single line.
How could i put of end of lines after each XML tag?
once i align the XML tags in my data file with end of line characters, then i want to do DIFF between two XML files to find the differences. please help.
Hi there,
I have 2 machines running HP-UX. One off these controllers is able to send mail and the other cannot. I have looked at all the settings that I know and coannot find any differences. Is there a way to audit the 2 machinces by pulling all the settings then compare any differences?
... (2 Replies)
Hi
Hope you are having a great weeknd !! I had a question and need your expertise for this :
I have 2 files File1 & File2(of same structure) which I need to compare on some columns. I need to find the values which are there in File2 but not in File 1 and put the Differences in another file... (5 Replies)
Hello,
I need to find all *.xml files that matched by pattern on Linux. I need to have written the file name on the screen and then change the pattern in the file just was found.
For instance.
I can start the script with arguments for keyword and for value, i.e
script.sh keyword... (1 Reply)
I have this file
427 A C A/C 12
436 G C G/C 12
445 C T C/T 12
447 A G A/G 9
451 T C T/C 5
456 A G A/G 12
493 G A G/A 12
I wanted to read the first column and find all other ids which are differences less than 10.
427 A C A/C 12 436
436 G C G/C 12 427,445... (7 Replies)
Hi All,
I am trying to validate XMLs from a folder:
Input Directory having multiple XML files:
File1.xml
<Root>
<Parent>
<Child Name="One">
<Foo>...</Foo>
<Bar>...</Bar>
<Baz>...</Baz>
</Child>
<Child Name="Two">
<Foo>...</Foo>... (3 Replies)
Hi....I'm having 2 xml files, one is having some special characters and another is a clean xml file does not have any special characters. Now I need one audit kind of file which will show me only from which line the special characters have been removed and the special characters.
Can you please... (1 Reply)
Hello everybody,
I have a double mission with some XML files, which is pretty challenging for my actual beginner UNIX knowledge. I need to extract some strings from multiple XML files and create a new XML file with the searched strings..
The original XML files contain the source code for... (12 Replies)
Hi Everyone,
I'm new here and I was checking this old post:
/shell-programming-and-scripting/180669-splitting-file-into-several-smaller-files-using-perl.html
(cannot paste link because of lack of points)
I need to do something like this but understand very little of perl.
I also check... (4 Replies)
Hi,
I'm having a xml file with multiple xml header. so i want to split the file into multiple files.
Sample.xml consists multiple headers so how can we split these multiple headers into multiple files in unix.
eg :
<?xml version="1.0" encoding="UTF-8"?>
<ml:individual... (3 Replies)
I am trying to find the differences between the two sorted, tab separated, attached files. Thank you :).
In update2 there are 52,058 lines and in current2 there are 52,197 so 139 differences should result.
However,
awk 'FNR==NR{a;next}!($0 in a)' update2 current2 > out2comm -1 -3... (2 Replies)
Discussion started by: cmccabe
2 Replies
LEARN ABOUT REDHAT
xml::um
XML::UM(3) User Contributed Perl Documentation XML::UM(3)NAME
XML::UM - Convert UTF-8 strings to any encoding supported by XML::Encoding
SYNOPSIS
use XML::UM;
# Set directory with .xml files that comes with XML::Encoding distribution
# Always include the trailing slash!
$XML::UM::ENCDIR = '/home1/enno/perlModules/XML-Encoding-1.01/maps/';
# Create the encoding routine
my $encode = XML::UM::get_encode (
Encoding => 'ISO-8859-2',
EncodeUnmapped => &XML::UM::encode_unmapped_dec);
# Convert a string from UTF-8 to the specified Encoding
my $encoded_str = $encode->($utf8_str);
# Remove circular references for garbage collection
XML::UM::dispose_encoding ('ISO-8859-2');
DESCRIPTION
This module provides methods to convert UTF-8 strings to any XML encoding that XML::Encoding supports. It creates mapping routines from the
.xml files that can be found in the maps/ directory in the XML::Encoding distribution. Note that the XML::Encoding distribution does
install the .enc files in your perl directory, but not the.xml files they were created from. That's why you have to specify $ENCDIR as in
the SYNOPSIS.
This implementation uses the XML::Encoding class to parse the .xml file and creates a hash that maps UTF-8 characters (each consisting of
up to 4 bytes) to their equivalent byte sequence in the specified encoding. Note that large mappings may consume a lot of memory!
Future implementations may parse the .enc files directly, or do the conversions entirely in XS (i.e. C code.)
get_encode (Encoding => STRING, EncodeUnmapped => SUB)
The central entry point to this module is the XML::UM::get_encode() method. It forwards the call to the global $XML::UM::FACTORY, which is
defined as an instance of XML::UM::SlowMapperFactory by default. Override this variable to plug in your own mapper factory.
The XML::UM::SlowMapperFactory creates an instance of XML::UM::SlowMapper (and caches it for subsequent use) that reads in the .xml encod-
ing file and creates a hash that maps UTF-8 characters to encoded characters.
The get_encode() method of XML::UM::SlowMapper is called, finally, which generates an anonimous subroutine that uses the hash to convert
multi-character UTF-8 blocks to the proper encoding.
dispose_encoding ($encoding_name)
Call this to free the memory used by the SlowMapper for a specific encoding. Note that in order to free the big conversion hash, the user
should no longer have references to the subroutines generated by get_encode().
The parameters to the get_encode() method (defined as name/value pairs) are:
o Encoding
The name of the desired encoding, e.g. 'ISO-8859-2'
o EncodeUnmapped (Default: &XML::UM::encode_unmapped_dec)
Defines how Unicode characters not found in the mapping file (of the specified encoding) are printed. By default, they are converted
to decimal entity references, like '{'
Use &XML::UM::encode_unmapped_hex for hexadecimal constants, like '«'
CAVEATS
I'm not exactly sure about which Unicode characters in the range (0 .. 127) should be mapped to themselves. See comments in XML/UM.pm near
%DEFAULT_ASCII_MAPPINGS.
The encodings that expat supports by default are currently not supported, (e.g. UTF-16, ISO-8859-1), because there are no .enc files avail-
able for these encodings. This module needs some more work. If you have the time, please help!
AUTHOR
Send bug reports, hints, tips, suggestions to Enno Derksen at <enno@att.com>.
perl v5.8.0 2000-02-17 XML::UM(3)