remove duplicated xml record in a file under unix


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting remove duplicated xml record in a file under unix
# 8  
Old 09-20-2006
/^end:/ { ... ; next }
Select end of record line (line starting with 'end:'), execute '...' code and read next line.

if (! (Record in Records)) { ... }
If the record definition have not been memorized in the Records array, execute '...' code.
The code print the full record (label, definition, endSmilie and memorize the record definition.

Records[Record];
Create an element in the array Records. The index of this element is the record definition.

print RecordLabel ":"; print Record; print $0;
Print the full record : Label, definition and end.

Record = "";
Reset the Record definition.

$1 ~ /^.*:/ { ... ; next}
Select start of record (line with field 1 ending with ':'), execute '...' code and read next line.

sub(/:.*/, "", $1);
RecordLabel = $1;

The record label is memorized in the RecordLabel variable.
It is equal to all characters before ':' in field 1.

{ ... }
Select record definition line, execute '...' code.

Record = (Record ? Record "\n" : "") $0;
Append line read $0 to the variable Record where previous lines are memorized.
A line separator is added before if a line have already been memorized.



Jean-Pierre.
# 9  
Old 09-20-2006
modification in ranj@chn code to work for you
Code:
paste -s -d"\t\t\t\n" f | sort -u -k2 | sort -k1 |tr "\t" "\n"

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicated records and update last line record counts

Hi Gurus, I need to remove duplicate line in file and update TRAILER (last line) record count. the file is comma delimited, field 2 is key to identify duplicated record. I can use below command to remove duplicated. but don't know how to replace last line 2nd field to new count. awk -F","... (11 Replies)
Discussion started by: green_k
11 Replies

2. Shell Programming and Scripting

How to remove duplicated column in a text file?

Dear all, How can I remove duplicated column in a text file? Input: LG10_PM_map_19_LEnd 1000560 G AA AA AA AA AA GG LG10_PM_map_19_LEnd 1005621 G GG GG GG AA AA GG LG10_PM_map_19_LEnd 1011214 A AA AA AA AA GG GG LG10_PM_map_19_LEnd 1011673 T TT TT TT TT CC CC... (1 Reply)
Discussion started by: huiyee1
1 Replies

3. Shell Programming and Scripting

Extract timestamp from first record in xml file and it checks if not it will replace first record

I have test.xml <emp><id>101</id><name>AAA</name><date>06/06/14 1811</date></emp> <Join><id>101</id><city>london</city><date>06/06/14 2011</date></join> <Join><id>101</id><city>new york</city><date>06/06/14 1811</date></join> <Join><id>101</id><city>sydney</city><date>06/06/14... (2 Replies)
Discussion started by: vsraju
2 Replies

4. Shell Programming and Scripting

How to remove duplicated lines?

Hi, if i have a file like this: Query=1 a a b c c c d Query=2 b b b c c e . . . (7 Replies)
Discussion started by: the_simpsons
7 Replies

5. UNIX for Dummies Questions & Answers

Delete a record in a xml file using shell scripting

find pattern, delete line with pattern and 3 lines above and 8 lines below the pattern. The pattern is "isup". The entire record with starting tag <record> and ending tag </record> containing the pattern is to be deleted and the rest to be retained. <record> ... (4 Replies)
Discussion started by: sdesstp
4 Replies

6. Shell Programming and Scripting

Help with remove duplicated content

Input file: hcmv-US25-2-3p hsa-3160-5 hcmv-US33 hsa-47 hcmv-UL70-3p hsa-4508 hcmv-UL70-3p hsa-4486 hcms-US25 hsa-360-5 hcms-US25 hsa-4 hcms-US25 hsa-458 hcms-US25 hsa-44812 . . Desired Output file: hcmv-US25-2-3p hsa-3160-5 hcmv-US33 hsa-47 hcmv-UL70-3p hsa-4508 hsa-4486... (3 Replies)
Discussion started by: perl_beginner
3 Replies

7. Shell Programming and Scripting

How to remove xml namespace from xml file using shell script?

I have an xml file: <AutoData xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <Table1> <Data1 10 </Data1> <Data2 20 </Data2> <Data3 40 </Data3> <Table1> </AutoData> and I have to remove the portion xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" only. I tried using sed... (10 Replies)
Discussion started by: Gary1978
10 Replies

8. UNIX for Advanced & Expert Users

How to read an Xml record contained in a file--urgent

Hi I have an xml file which has multiple xml records.. I don't know how to read those records and pipe them to another shell command the file is like <abc>z<def>y<ghi>x........</ghi></def></abc> (1st record) <jkl>z<mno>y<pqr>x........</pqr></mno></jkl> (2nd record) Each record end... (4 Replies)
Discussion started by: aixjadoo
4 Replies

9. UNIX for Dummies Questions & Answers

how to read record by record from a file in unix

Hi guys, i have a big file with the following format.This includes header(H),detail(D) and trailer(T) information in the file.My problem is i have to search for the character "6h" at 14 th and 15 th position in all the records .if it is there i have to write all those records into a... (1 Reply)
Discussion started by: raoscb
1 Replies

10. Shell Programming and Scripting

remove duplicated columns

hi all, i have a file contain multicolumns, this file is sorted by col2 and col3. i want to remove the duplicated columns if the col2 and col3 are the same in another line. example fileA AA BB CC DD CC XX CC DD BB CC ZZ FF DD FF HH HH the output is AA BB CC DD BB CC ZZ FF... (6 Replies)
Discussion started by: kamel.seg
6 Replies
Login or Register to Ask a Question