remove duplicated xml record in a file under unix


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting remove duplicated xml record in a file under unix
# 1  
Old 09-20-2006
remove duplicated xml record in a file under unix

Hi,

If i have a file with xml format, i would like to remove duplicated records and save to a new file. Is it possible...to write script to do it?
# 2  
Old 09-20-2006
Try
Code:
uniq inputfile

# 3  
Old 09-20-2006
I don't know if it's possible in shell or not, but it's possible in Perl. Do consider that option if you can.
# 4  
Old 09-20-2006
Is the Perl can run under ksh Unix?

Also, the record is a bit difference...it look like

record1:
this is testing
my id is 2001
end:
record2:
this is testing2
my id is 2002
end:
record3:
this is testing
my id is 2002
end:
record4:
this is testing2
my id is 2002
end:

For the above, record 2 and 4 is duplicated. Because of the "id" and "testing2" is the same. if only one line is the same which is not called duplicated..

So Perl or any friend can help for the script?
# 5  
Old 09-20-2006
check this

I havent tested this, but please check it
Code:
paste -s -d"\t\t\t\n" filename|sort -u |tr "\t" "\n"


Last edited by ranj@chn; 09-20-2006 at 08:54 AM.. Reason: mistake in command
# 6  
Old 09-20-2006
You can try to use awk.
Create the following awk script uniq.awk :
Code:
/^end:/ {
   if (! (Record in Records)) {
      Records[Record];
      print RecordLabel ":";
      print Record;
      print $0;  
      Record = "";
   }
   next;
}
$1 ~ /^.*:/ {
   sub(/:.*/, "", $1);
   RecordLabel = $1;
   next;
}
{
   Record = (Record ? Record "\n" : "") $0;
}

and execute it :
Code:
$ awk -f uniq.awk filename
record1:
this is testing
my id is 2001
end:
record2:
this is testing2
my id is 2002
end:
record3:
this is testing
my id is 2002
end:
$

jean-Pierre.
# 7  
Old 09-20-2006
Dear Sir,

It would be great help if you can describe the code below in detail, I have just started to learn about awk and I can say that understanding of following code in a clear way would help me a lot in future.
Quote:
/^end:/ {
if (! (Record in Records)) {
Records[Record];
print RecordLabel ":";
print Record;
print $0;
Record = "";
}
next;
}
$1 ~ /^.*:/ {
sub(/:.*/, "", $1);
RecordLabel = $1;
next;
}
{
Record = (Record ? Record "\n" : "") $0;
}
Thanks in advance.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicated records and update last line record counts

Hi Gurus, I need to remove duplicate line in file and update TRAILER (last line) record count. the file is comma delimited, field 2 is key to identify duplicated record. I can use below command to remove duplicated. but don't know how to replace last line 2nd field to new count. awk -F","... (11 Replies)
Discussion started by: green_k
11 Replies

2. Shell Programming and Scripting

How to remove duplicated column in a text file?

Dear all, How can I remove duplicated column in a text file? Input: LG10_PM_map_19_LEnd 1000560 G AA AA AA AA AA GG LG10_PM_map_19_LEnd 1005621 G GG GG GG AA AA GG LG10_PM_map_19_LEnd 1011214 A AA AA AA AA GG GG LG10_PM_map_19_LEnd 1011673 T TT TT TT TT CC CC... (1 Reply)
Discussion started by: huiyee1
1 Replies

3. Shell Programming and Scripting

Extract timestamp from first record in xml file and it checks if not it will replace first record

I have test.xml <emp><id>101</id><name>AAA</name><date>06/06/14 1811</date></emp> <Join><id>101</id><city>london</city><date>06/06/14 2011</date></join> <Join><id>101</id><city>new york</city><date>06/06/14 1811</date></join> <Join><id>101</id><city>sydney</city><date>06/06/14... (2 Replies)
Discussion started by: vsraju
2 Replies

4. Shell Programming and Scripting

How to remove duplicated lines?

Hi, if i have a file like this: Query=1 a a b c c c d Query=2 b b b c c e . . . (7 Replies)
Discussion started by: the_simpsons
7 Replies

5. UNIX for Dummies Questions & Answers

Delete a record in a xml file using shell scripting

find pattern, delete line with pattern and 3 lines above and 8 lines below the pattern. The pattern is "isup". The entire record with starting tag <record> and ending tag </record> containing the pattern is to be deleted and the rest to be retained. <record> ... (4 Replies)
Discussion started by: sdesstp
4 Replies

6. Shell Programming and Scripting

Help with remove duplicated content

Input file: hcmv-US25-2-3p hsa-3160-5 hcmv-US33 hsa-47 hcmv-UL70-3p hsa-4508 hcmv-UL70-3p hsa-4486 hcms-US25 hsa-360-5 hcms-US25 hsa-4 hcms-US25 hsa-458 hcms-US25 hsa-44812 . . Desired Output file: hcmv-US25-2-3p hsa-3160-5 hcmv-US33 hsa-47 hcmv-UL70-3p hsa-4508 hsa-4486... (3 Replies)
Discussion started by: perl_beginner
3 Replies

7. Shell Programming and Scripting

How to remove xml namespace from xml file using shell script?

I have an xml file: <AutoData xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <Table1> <Data1 10 </Data1> <Data2 20 </Data2> <Data3 40 </Data3> <Table1> </AutoData> and I have to remove the portion xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" only. I tried using sed... (10 Replies)
Discussion started by: Gary1978
10 Replies

8. UNIX for Advanced & Expert Users

How to read an Xml record contained in a file--urgent

Hi I have an xml file which has multiple xml records.. I don't know how to read those records and pipe them to another shell command the file is like <abc>z<def>y<ghi>x........</ghi></def></abc> (1st record) <jkl>z<mno>y<pqr>x........</pqr></mno></jkl> (2nd record) Each record end... (4 Replies)
Discussion started by: aixjadoo
4 Replies

9. UNIX for Dummies Questions & Answers

how to read record by record from a file in unix

Hi guys, i have a big file with the following format.This includes header(H),detail(D) and trailer(T) information in the file.My problem is i have to search for the character "6h" at 14 th and 15 th position in all the records .if it is there i have to write all those records into a... (1 Reply)
Discussion started by: raoscb
1 Replies

10. Shell Programming and Scripting

remove duplicated columns

hi all, i have a file contain multicolumns, this file is sorted by col2 and col3. i want to remove the duplicated columns if the col2 and col3 are the same in another line. example fileA AA BB CC DD CC XX CC DD BB CC ZZ FF DD FF HH HH the output is AA BB CC DD BB CC ZZ FF... (6 Replies)
Discussion started by: kamel.seg
6 Replies
Login or Register to Ask a Question