Sponsored Content
Top Forums Shell Programming and Scripting remove duplicated xml record in a file under unix Post 302089631 by happyv on Wednesday 20th of September 2006 06:34:51 AM
Old 09-20-2006
Is the Perl can run under ksh Unix?

Also, the record is a bit difference...it look like

record1:
this is testing
my id is 2001
end:
record2:
this is testing2
my id is 2002
end:
record3:
this is testing
my id is 2002
end:
record4:
this is testing2
my id is 2002
end:

For the above, record 2 and 4 is duplicated. Because of the "id" and "testing2" is the same. if only one line is the same which is not called duplicated..

So Perl or any friend can help for the script?
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

remove duplicated columns

hi all, i have a file contain multicolumns, this file is sorted by col2 and col3. i want to remove the duplicated columns if the col2 and col3 are the same in another line. example fileA AA BB CC DD CC XX CC DD BB CC ZZ FF DD FF HH HH the output is AA BB CC DD BB CC ZZ FF... (6 Replies)
Discussion started by: kamel.seg
6 Replies

2. UNIX for Dummies Questions & Answers

how to read record by record from a file in unix

Hi guys, i have a big file with the following format.This includes header(H),detail(D) and trailer(T) information in the file.My problem is i have to search for the character "6h" at 14 th and 15 th position in all the records .if it is there i have to write all those records into a... (1 Reply)
Discussion started by: raoscb
1 Replies

3. UNIX for Advanced & Expert Users

How to read an Xml record contained in a file--urgent

Hi I have an xml file which has multiple xml records.. I don't know how to read those records and pipe them to another shell command the file is like <abc>z<def>y<ghi>x........</ghi></def></abc> (1st record) <jkl>z<mno>y<pqr>x........</pqr></mno></jkl> (2nd record) Each record end... (4 Replies)
Discussion started by: aixjadoo
4 Replies

4. Shell Programming and Scripting

How to remove xml namespace from xml file using shell script?

I have an xml file: <AutoData xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <Table1> <Data1 10 </Data1> <Data2 20 </Data2> <Data3 40 </Data3> <Table1> </AutoData> and I have to remove the portion xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" only. I tried using sed... (10 Replies)
Discussion started by: Gary1978
10 Replies

5. Shell Programming and Scripting

Help with remove duplicated content

Input file: hcmv-US25-2-3p hsa-3160-5 hcmv-US33 hsa-47 hcmv-UL70-3p hsa-4508 hcmv-UL70-3p hsa-4486 hcms-US25 hsa-360-5 hcms-US25 hsa-4 hcms-US25 hsa-458 hcms-US25 hsa-44812 . . Desired Output file: hcmv-US25-2-3p hsa-3160-5 hcmv-US33 hsa-47 hcmv-UL70-3p hsa-4508 hsa-4486... (3 Replies)
Discussion started by: perl_beginner
3 Replies

6. UNIX for Dummies Questions & Answers

Delete a record in a xml file using shell scripting

find pattern, delete line with pattern and 3 lines above and 8 lines below the pattern. The pattern is "isup". The entire record with starting tag <record> and ending tag </record> containing the pattern is to be deleted and the rest to be retained. <record> ... (4 Replies)
Discussion started by: sdesstp
4 Replies

7. Shell Programming and Scripting

How to remove duplicated lines?

Hi, if i have a file like this: Query=1 a a b c c c d Query=2 b b b c c e . . . (7 Replies)
Discussion started by: the_simpsons
7 Replies

8. Shell Programming and Scripting

Extract timestamp from first record in xml file and it checks if not it will replace first record

I have test.xml <emp><id>101</id><name>AAA</name><date>06/06/14 1811</date></emp> <Join><id>101</id><city>london</city><date>06/06/14 2011</date></join> <Join><id>101</id><city>new york</city><date>06/06/14 1811</date></join> <Join><id>101</id><city>sydney</city><date>06/06/14... (2 Replies)
Discussion started by: vsraju
2 Replies

9. Shell Programming and Scripting

How to remove duplicated column in a text file?

Dear all, How can I remove duplicated column in a text file? Input: LG10_PM_map_19_LEnd 1000560 G AA AA AA AA AA GG LG10_PM_map_19_LEnd 1005621 G GG GG GG AA AA GG LG10_PM_map_19_LEnd 1011214 A AA AA AA AA GG GG LG10_PM_map_19_LEnd 1011673 T TT TT TT TT CC CC... (1 Reply)
Discussion started by: huiyee1
1 Replies

10. Shell Programming and Scripting

Remove duplicated records and update last line record counts

Hi Gurus, I need to remove duplicate line in file and update TRAILER (last line) record count. the file is comma delimited, field 2 is key to identify duplicated record. I can use below command to remove duplicated. but don't know how to replace last line 2nd field to new count. awk -F","... (11 Replies)
Discussion started by: green_k
11 Replies
MARC::File::XML(3pm)					User Contributed Perl Documentation				      MARC::File::XML(3pm)

NAME
MARC::File::XML - Work with MARC data encoded as XML SYNOPSIS
## Loading with USE options use MARC::File::XML ( BinaryEncoding => 'utf8', RecordFormat => 'UNIMARC' ); ## Setting the record format without USE options MARC::File::XML->default_record_format('USMARC'); ## reading with MARC::Batch my $batch = MARC::Batch->new( 'XML', $filename ); my $record = $batch->next(); ## or reading with MARC::File::XML explicitly my $file = MARC::File::XML->in( $filename ); my $record = $file->next(); ## serialize a single MARC::Record object as XML print $record->as_xml(); ## write a bunch of records to a file my $file = MARC::File::XML->out( 'myfile.xml' ); $file->write( $record1 ); $file->write( $record2 ); $file->write( $record3 ); $file->close(); ## instead of writing to disk, get the xml directly my $xml = join( " ", MARC::File::XML::header(), MARC::File::XML::record( $record1 ), MARC::File::XML::record( $record2 ), MARC::File::XML::footer() ); DESCRIPTION
The MARC-XML distribution is an extension to the MARC-Record distribution for working with MARC21 data that is encoded as XML. The XML encoding used is the MARC21slim schema supplied by the Library of Congress. More information may be obtained here: http://www.loc.gov/standards/marcxml/ You must have MARC::Record installed to use MARC::File::XML. In fact once you install the MARC-XML distribution you will most likely not use it directly, but will have an additional file format available to you when you use MARC::Batch. This version of MARC-XML supersedes an the versions ending with 0.25 which were used with the MARC.pm framework. MARC-XML now uses MARC::Record exclusively. If you have any questions or would like to contribute to this module please sign on to the perl4lib list. More information about perl4lib is available at <http://perl4lib.perl.org>. METHODS
When you use MARC::File::XML your MARC::Record objects will have two new additional methods available to them: MARC::File::XML->default_record_format([$format]) Sets or returns the default record format used by MARC::File::XML. Valid formats are MARC21, USMARC, UNIMARC and UNIMARCAUTH. MARC::File::XML->default_record_format('UNIMARC'); as_xml() Returns a MARC::Record object serialized in XML. You can pass an optional format parameter to tell MARC::File::XML what type of record (USMARC, UNIMARC, UNIMARCAUTH) you are serializing. print $record->as_xml([$format]); as_xml_record([$format]) Returns a MARC::Record object serialized in XML without a collection wrapper. You can pass an optional format parameter to tell MARC::File::XML what type of record (USMARC, UNIMARC, UNIMARCAUTH) you are serializing. print $record->as_xml_record('UNIMARC'); new_from_xml([$encoding, $format]) If you have a chunk of XML and you want a record object for it you can use this method to generate a MARC::Record object. You can pass an optional encoding parameter to specify which encoding (UTF-8 or MARC-8) you would like the resulting record to be in. You can also pass a format parameter to specify the source record type, such as UNIMARC, UNIMARCAUTH, USMARC or MARC21. my $record = MARC::Record->new_from_xml( $xml, $encoding, $format ); Note: only works for single record XML chunks. If you want to write records as XML to a file you can use out() with write() to serialize more than one record as XML. out() A constructor for creating a MARC::File::XML object that can write XML to a file. You must pass in the name of a file to write XML to. If the $encoding parameter or the DefaultEncoding (see above) is set to UTF-8 then the binmode of the output file will be set appropriately. my $file = MARC::File::XML->out( $filename [, $encoding] ); write() Used in tandem with out() to write records to a file. my $file = MARC::File::XML->out( $filename ); $file->write( $record1 ); $file->write( $record2 ); close() When writing records to disk the filehandle is automatically closed when you the MARC::File::XML object goes out of scope. If you want to close it explicitly use the close() method. If you want to generate batches of records as XML, but don't want to write to disk you'll have to use header(), record() and footer() to generate the different portions. $xml = join( " ", MARC::File::XML::header(), MARC::File::XML::record( $record1 ), MARC::File::XML::record( $record2 ), MARC::File::XML::record( $record3 ), MARC::File::XML::footer() ); header() Returns a string of XML to use as the header to your XML file. footer() Returns a string of XML to use at the end of your XML file. record() Returns a chunk of XML suitable for placement between the header and the footer. decode() You probably don't ever want to call this method directly. If you do you should pass in a chunk of XML as the argument. It is normally invoked by a call to next(), see MARC::Batch or MARC::File. encode() You probably want to use the as_xml() method on your MARC::Record object instead of calling this directly. But if you want to you just need to pass in the MARC::Record object you wish to encode as XML, and you will be returned the XML as a scalar. TODO
o Support for callback filters in decode(). SEE ALSO
<http://www.loc.gov/standards/marcxml/> MARC::File::USMARC MARC::Batch MARC::Record AUTHORS
o Ed Summers <ehs@pobox.com> perl v5.14.2 2011-02-11 MARC::File::XML(3pm)
All times are GMT -4. The time now is 11:36 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy