How to remove duplicate text blocks from a file?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to remove duplicate text blocks from a file?
# 1  
Old 05-06-2015
How to remove duplicate text blocks from a file?

Hi All

I have a list of files which will have duplicate list of blocks of text. Following is a sample of the file, I have removed the sensitive information from the file.
All the code samples starts from <TR BGCOLOR="white"> and Ends with IP address and two html tags like this.
Code:
10.14.22.22
</TD>
</TR>

Multiple duplication can appear on the file and what I need is to go through the file and just remove the duplicated blocks from the file,
Given that it is a HTML file I need to keep the format of the file and only codeblock within these tags to be evalated.

I have tried many sample code (sed, awk and python) all results in removing other codes in the file (like other html tags).

Thanks in advance for any help

Code:
<TR BGCOLOR="white">
<TD>30Apr2015</TD>
<TD>17:39:08</TD>
<TD>NAME</TD>
<TD>firewall_policy</TD>
<TD>fw_policies</TD>
<TD>Modify Object</TD>
<TD><H3> XX - </H3> <br> SOME DATA HERE<br></TD>

<TD>p111111</TD>
</TR>

<TR BGCOLOR="white">
<TD>1May2015</TD>
<TD>9:06:34</TD>
<TD>NAME2</TD>
<TD>firewall_policy</TD>
<TD>fw_policies</TD>
<TD>Modify Object</TD>
<TD><H3> YY </H3> <br> SOME OTHER DATA HERE.<br></TD>

<TD>p222222</TD>
<TD>
10.14.22.22
</TD>
</TR>


<TR BGCOLOR="white">
<TD>30Apr2015</TD>
<TD>17:39:08</TD>
<TD>NAME</TD>
<TD>firewall_policy</TD>
<TD>fw_policies</TD>
<TD>Modify Object</TD>
<TD><H3> XX - </H3> <br> SOME DATA HERE<br></TD>

<TD>p111111</TD>
</TR>

<TR BGCOLOR="white">
<TD>1May2015</TD>
<TD>9:06:34</TD>
<TD>NAME2</TD>
<TD>firewall_policy</TD>
<TD>fw_policies</TD>
<TD>Modify Object</TD>
<TD><H3> YY </H3> <br> SOME OTHER DATA HERE.<br></TD>

<TD>p222222</TD>
<TD>
10.14.22.22
</TD>
</TR>


<TR BGCOLOR="white">
<TD>30Apr2015</TD>
<TD>04:39:10</TD>
<TD>NAME3</TD>
<TD>firewall_policy</TD>
<TD>fw_policies</TD>
<TD>Modify Object</TD>
<TD><H3> ZZ </H3> <br> SOME OTHER DATA XXXX HERE.<br></TD>

<TD>p333333</TD>
<TD>
10.14.33.33
</TD>
</TR>


Last edited by Don Cragun; 05-06-2015 at 04:50 AM.. Reason: Add CODE and ICODE tags.
# 2  
Old 05-06-2015
Please use code tags as required by forum rules!

For identical records, and with two empty lines as record separators as given in your example, this might work :
Code:
 awk '!T[$0]++' RS="\n\n\n" ORS="\n\n\n" file

This User Gave Thanks to RudiC For This Post:
# 3  
Old 05-06-2015
Thanks RudiC

This is what I was after, simple solution
If possible can you explain what is done using this code sample,

Code:
'!T[$0]++'

# 4  
Old 05-07-2015
T is an array indexed by the entire record $0, defined when first referenced, initially empty = FALSE. By negating, it becomes TRUE and executes the default action: print. As T[$0] is post-incremented, the next time(s) its negation will evaluate to FALSE and thus not print anymore.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicate occurrences of text pattern

Hi folks! I have a file which contains a 1000 lines. On each line i have multiple occurrences ( 26 to be exact ) of pattern folder#/folder#. # is depicting the line number in the file some text here folder1/folder1 some text here folder1/folder1 some text here folder1/folder1 some text... (7 Replies)
Discussion started by: martinsmith
7 Replies

2. Windows & DOS: Issues & Discussions

Remove duplicate lines from text files.

So, I have text files, one "fail.txt" And one "color.txt" I now want to use a command line (DOS) to remove ANY line that is PRESENT IN BOTH from each text file. Afterwards there shall be no duplicate lines. (1 Reply)
Discussion started by: pasc
1 Replies

3. Shell Programming and Scripting

Blocks of text in a file - extract when matches...

I sat down yesterday to write this script and have just realised that my methodology is broken........ In essense I have..... ----------------------------------------------------------------- (This line really is in the file) Service ID: 12345 ... (7 Replies)
Discussion started by: Bashingaway
7 Replies

4. Shell Programming and Scripting

Adding and removing blocks of text from file

Hello all, short story: I'm writing a script to add and remove dns records in dns files. Its on a RHEL 5.5 So far i've locked up the basic operations in a couple of functions: - validate the parameters - search for existant ip in file when adding - search for existant name records in... (6 Replies)
Discussion started by: maverick72
6 Replies

5. UNIX for Dummies Questions & Answers

Duplicate blocks in an inode

I have 2 duplicate blocks in an inode and I want to get rid of one of them so that I can get into my pc. The message I get is Multiply-claimed block(s) in inode 5997500: 12690101 12690101. All help is appreciated. Thanks (7 Replies)
Discussion started by: Nighttrain
7 Replies

6. Shell Programming and Scripting

[uniq + awk?] How to remove duplicate blocks of lines in files?

Hello again, I am wanting to remove all duplicate blocks of XML code in a file. This is an example: input: <string-array name="threeItems"> <item>item1</item> <item>item2</item> <item>item3</item> </string-array> <string-array name="twoItems"> <item>item1</item> <item>item2</item>... (19 Replies)
Discussion started by: raidzero
19 Replies

7. Shell Programming and Scripting

Remove duplicate files based on text string?

Hi I have been struggling with a script for removing duplicate messages from a shared mailbox. I would like to search for duplicate messages based on the “Message-ID” string within the messages files. I have managed to find the duplicate “Message-ID” strings and (if I would like) delete... (1 Reply)
Discussion started by: spangberg
1 Replies

8. Shell Programming and Scripting

extract blocks of text from a file

Hi, This is part of a large text file I need to separate out. I'd like some help to build a shell script that will extract the text between sets of dashed lines, write that to a new file using the whole or part of the first text string as the new file name, then move on to the next one and... (7 Replies)
Discussion started by: cajunfries
7 Replies

9. Shell Programming and Scripting

Remove duplicate text

Hello, I have a log file which is generated by a script which looks like this: userid: 7 starttime: Sat May 24 23:24:13 CEST 2008 endtime: Sat May 24 23:26:57 CEST 2008 total time spent: 2.73072 minutes / 163.843 seconds date: Sat Jun 7 16:09:03 CEST 2008 userid: 8 starttime: Sun May... (7 Replies)
Discussion started by: dejavu88
7 Replies

10. Shell Programming and Scripting

Delete blocks of lines from text file

Hello, Hello Firends, I have file like below. I want to remove selected blocks say abc,pqr,lst. how can i remove those blocks from file. zone abc { blah blah blah } zone xyz { blah blah blah } zone pqr { blah blah blah } (4 Replies)
Discussion started by: nrbhole
4 Replies
Login or Register to Ask a Question