Sponsored Content
Top Forums Shell Programming and Scripting How to remove duplicate text blocks from a file? Post 302943166 by mahasona on Wednesday 6th of May 2015 02:22:30 AM
Old 05-06-2015
How to remove duplicate text blocks from a file?

Hi All

I have a list of files which will have duplicate list of blocks of text. Following is a sample of the file, I have removed the sensitive information from the file.
All the code samples starts from <TR BGCOLOR="white"> and Ends with IP address and two html tags like this.
Code:
10.14.22.22
</TD>
</TR>

Multiple duplication can appear on the file and what I need is to go through the file and just remove the duplicated blocks from the file,
Given that it is a HTML file I need to keep the format of the file and only codeblock within these tags to be evalated.

I have tried many sample code (sed, awk and python) all results in removing other codes in the file (like other html tags).

Thanks in advance for any help

Code:
<TR BGCOLOR="white">
<TD>30Apr2015</TD>
<TD>17:39:08</TD>
<TD>NAME</TD>
<TD>firewall_policy</TD>
<TD>fw_policies</TD>
<TD>Modify Object</TD>
<TD><H3> XX - </H3> <br> SOME DATA HERE<br></TD>

<TD>p111111</TD>
</TR>

<TR BGCOLOR="white">
<TD>1May2015</TD>
<TD>9:06:34</TD>
<TD>NAME2</TD>
<TD>firewall_policy</TD>
<TD>fw_policies</TD>
<TD>Modify Object</TD>
<TD><H3> YY </H3> <br> SOME OTHER DATA HERE.<br></TD>

<TD>p222222</TD>
<TD>
10.14.22.22
</TD>
</TR>


<TR BGCOLOR="white">
<TD>30Apr2015</TD>
<TD>17:39:08</TD>
<TD>NAME</TD>
<TD>firewall_policy</TD>
<TD>fw_policies</TD>
<TD>Modify Object</TD>
<TD><H3> XX - </H3> <br> SOME DATA HERE<br></TD>

<TD>p111111</TD>
</TR>

<TR BGCOLOR="white">
<TD>1May2015</TD>
<TD>9:06:34</TD>
<TD>NAME2</TD>
<TD>firewall_policy</TD>
<TD>fw_policies</TD>
<TD>Modify Object</TD>
<TD><H3> YY </H3> <br> SOME OTHER DATA HERE.<br></TD>

<TD>p222222</TD>
<TD>
10.14.22.22
</TD>
</TR>


<TR BGCOLOR="white">
<TD>30Apr2015</TD>
<TD>04:39:10</TD>
<TD>NAME3</TD>
<TD>firewall_policy</TD>
<TD>fw_policies</TD>
<TD>Modify Object</TD>
<TD><H3> ZZ </H3> <br> SOME OTHER DATA XXXX HERE.<br></TD>

<TD>p333333</TD>
<TD>
10.14.33.33
</TD>
</TR>


Last edited by Don Cragun; 05-06-2015 at 04:50 AM.. Reason: Add CODE and ICODE tags.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Delete blocks of lines from text file

Hello, Hello Firends, I have file like below. I want to remove selected blocks say abc,pqr,lst. how can i remove those blocks from file. zone abc { blah blah blah } zone xyz { blah blah blah } zone pqr { blah blah blah } (4 Replies)
Discussion started by: nrbhole
4 Replies

2. Shell Programming and Scripting

Remove duplicate text

Hello, I have a log file which is generated by a script which looks like this: userid: 7 starttime: Sat May 24 23:24:13 CEST 2008 endtime: Sat May 24 23:26:57 CEST 2008 total time spent: 2.73072 minutes / 163.843 seconds date: Sat Jun 7 16:09:03 CEST 2008 userid: 8 starttime: Sun May... (7 Replies)
Discussion started by: dejavu88
7 Replies

3. Shell Programming and Scripting

extract blocks of text from a file

Hi, This is part of a large text file I need to separate out. I'd like some help to build a shell script that will extract the text between sets of dashed lines, write that to a new file using the whole or part of the first text string as the new file name, then move on to the next one and... (7 Replies)
Discussion started by: cajunfries
7 Replies

4. Shell Programming and Scripting

Remove duplicate files based on text string?

Hi I have been struggling with a script for removing duplicate messages from a shared mailbox. I would like to search for duplicate messages based on the “Message-ID” string within the messages files. I have managed to find the duplicate “Message-ID” strings and (if I would like) delete... (1 Reply)
Discussion started by: spangberg
1 Replies

5. Shell Programming and Scripting

[uniq + awk?] How to remove duplicate blocks of lines in files?

Hello again, I am wanting to remove all duplicate blocks of XML code in a file. This is an example: input: <string-array name="threeItems"> <item>item1</item> <item>item2</item> <item>item3</item> </string-array> <string-array name="twoItems"> <item>item1</item> <item>item2</item>... (19 Replies)
Discussion started by: raidzero
19 Replies

6. UNIX for Dummies Questions & Answers

Duplicate blocks in an inode

I have 2 duplicate blocks in an inode and I want to get rid of one of them so that I can get into my pc. The message I get is Multiply-claimed block(s) in inode 5997500: 12690101 12690101. All help is appreciated. Thanks (7 Replies)
Discussion started by: Nighttrain
7 Replies

7. Shell Programming and Scripting

Adding and removing blocks of text from file

Hello all, short story: I'm writing a script to add and remove dns records in dns files. Its on a RHEL 5.5 So far i've locked up the basic operations in a couple of functions: - validate the parameters - search for existant ip in file when adding - search for existant name records in... (6 Replies)
Discussion started by: maverick72
6 Replies

8. Shell Programming and Scripting

Blocks of text in a file - extract when matches...

I sat down yesterday to write this script and have just realised that my methodology is broken........ In essense I have..... ----------------------------------------------------------------- (This line really is in the file) Service ID: 12345 ... (7 Replies)
Discussion started by: Bashingaway
7 Replies

9. Windows & DOS: Issues & Discussions

Remove duplicate lines from text files.

So, I have text files, one "fail.txt" And one "color.txt" I now want to use a command line (DOS) to remove ANY line that is PRESENT IN BOTH from each text file. Afterwards there shall be no duplicate lines. (1 Reply)
Discussion started by: pasc
1 Replies

10. Shell Programming and Scripting

Remove duplicate occurrences of text pattern

Hi folks! I have a file which contains a 1000 lines. On each line i have multiple occurrences ( 26 to be exact ) of pattern folder#/folder#. # is depicting the line number in the file some text here folder1/folder1 some text here folder1/folder1 some text here folder1/folder1 some text... (7 Replies)
Discussion started by: martinsmith
7 Replies
Object(library call)													      Object(library call)

NAME
Object -- The Object widget class SYNOPSIS
#include <Xm/Xm.h> DESCRIPTION
Object is never instantiated. Its sole purpose is as a supporting superclass for other widget classes. Classes The class pointer is objectClass. The class name is Object. New Resources The following table defines a set of widget resources used by the programmer to specify data. The programmer can also set the resource val- ues for the inherited classes to set attributes for this widget. To reference a resource by name or by class in a .Xdefaults file, remove the XmN or XmC prefix and use the remaining letters. To specify one of the defined values for a resource in a .Xdefaults file, remove the Xm prefix and use the remaining letters (in either lowercase or uppercase, but include any underscores between words). The codes in the access column indicate if the given resource can be set at creation time (C), set by using XtSetValues (S), retrieved by using XtGetValues (G), or is not applicable (N/A). +---------------------------------------------------------------------+ | | Object Resource Set | | | |Name | Class | Type | Default | Access | +-------------------+-------------+----------------+---------+--------+ |XmNdestroyCallback | XmCCallback | XtCallbackList | NULL | C | +-------------------+-------------+----------------+---------+--------+ +-------------------+-------------+----------------+---------+--------+ XmNdestroyCallback Specifies a list of callbacks that is called when the gadget is destroyed. Translations There are no translation for Object. Object(library call)
All times are GMT -4. The time now is 04:06 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy