Sponsored Content
Top Forums Shell Programming and Scripting Filter or remove duplicate block of text without distinguishing marks or fields Post 302563514 by radoulov on Tuesday 11th of October 2011 12:05:57 PM
Old 10-11-2011
This should handle multiple trailing newlines (the multiple leading newlines should be already OK):
Code:
awk 'END {
  for (i = 0; ++i <= idx;)
    printf "%s\n", p[i]
  if (p[i - 1] != r)
    print r  
  }
/\$newpage/ {
    sub(/\n\n*$/, "\n", r)
    t[r]++ || p[++idx] = r
    r = x; next
    }
{
  r = r ? r RS $0 : $0  
  }' infile

Le me know how it goes!
This User Gave Thanks to radoulov For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicate text

Hello, I have a log file which is generated by a script which looks like this: userid: 7 starttime: Sat May 24 23:24:13 CEST 2008 endtime: Sat May 24 23:26:57 CEST 2008 total time spent: 2.73072 minutes / 163.843 seconds date: Sat Jun 7 16:09:03 CEST 2008 userid: 8 starttime: Sun May... (7 Replies)
Discussion started by: dejavu88
7 Replies

2. Shell Programming and Scripting

Filter duplicate block of text using SED

Hi, I would like to print a block of text between 2 regular expression using Sed, This can be achieved by using the command as shown below, however my problem is the same block of text is repeated twice. I would like to eliminate the duplicate block of text. For Example If my file... (5 Replies)
Discussion started by: dkumar91
5 Replies

3. Shell Programming and Scripting

Remove duplicate files based on text string?

Hi I have been struggling with a script for removing duplicate messages from a shared mailbox. I would like to search for duplicate messages based on the “Message-ID” string within the messages files. I have managed to find the duplicate “Message-ID” strings and (if I would like) delete... (1 Reply)
Discussion started by: spangberg
1 Replies

4. Shell Programming and Scripting

Filter/remove duplicate .dat file with certain criteria

I am a beginner in Unix. Though have been asked to write a script to filter(remove duplicates) data from a .dat file. File is very huge containig billions of records. contents of file looks like 30002157,40342424,OTC,mart_rec,100, ,0 30002157,40343369,OTC,mart_rec,95, ,0... (6 Replies)
Discussion started by: mukeshguliao
6 Replies

5. Shell Programming and Scripting

Find duplicate based on 'n' fields and mark the duplicate as 'D'

Hi, In a file, I have to mark duplicate records as 'D' and the latest record alone as 'C'. In the below file, I have to identify if duplicate records are there or not based on Man_ID, Man_DT, Ship_ID and I have to mark the record with latest Ship_DT as "C" and other as "D" (I have to create... (7 Replies)
Discussion started by: machomaddy
7 Replies

6. Windows & DOS: Issues & Discussions

Remove duplicate lines from text files.

So, I have text files, one "fail.txt" And one "color.txt" I now want to use a command line (DOS) to remove ANY line that is PRESENT IN BOTH from each text file. Afterwards there shall be no duplicate lines. (1 Reply)
Discussion started by: pasc
1 Replies

7. Shell Programming and Scripting

Remove duplicate lines from file based on fields

Dear community, I have to remove duplicate lines from a file contains a very big ammount of rows (milions?) based on 1st and 3rd columns The data are like this: Region 23/11/2014 09:11:36 41752 Medio 23/11/2014 03:11:38 4132 Info 23/11/2014 05:11:09 4323... (2 Replies)
Discussion started by: Lord Spectre
2 Replies

8. Shell Programming and Scripting

How to remove duplicate text blocks from a file?

Hi All I have a list of files which will have duplicate list of blocks of text. Following is a sample of the file, I have removed the sensitive information from the file. All the code samples starts from <TR BGCOLOR="white"> and Ends with IP address and two html tags like this. 10.14.22.22... (3 Replies)
Discussion started by: mahasona
3 Replies

9. Shell Programming and Scripting

Remove duplicate occurrences of text pattern

Hi folks! I have a file which contains a 1000 lines. On each line i have multiple occurrences ( 26 to be exact ) of pattern folder#/folder#. # is depicting the line number in the file some text here folder1/folder1 some text here folder1/folder1 some text here folder1/folder1 some text... (7 Replies)
Discussion started by: martinsmith
7 Replies

10. Shell Programming and Scripting

Filter file to remove duplicate values in first column

Hello, I have a script that is generating a tab delimited output file. num Name PCA_A1 PCA_A2 PCA_A3 0 compound_00 -3.5054 -1.1207 -2.4372 1 compound_01 -2.2641 0.4287 -1.6120 3 compound_03 -1.3053 1.8495 ... (3 Replies)
Discussion started by: LMHmedchem
3 Replies
Perl::Critic::Utils::POD(3pm)				User Contributed Perl Documentation			     Perl::Critic::Utils::POD(3pm)

NAME
Perl::Critic::Utils::POD - Utility functions for dealing with POD. SYNOPSIS
use Perl::Critic::Utils::POD qw< get_pod_section_from_file >; my $synopsis = get_pod_section_from_file('Perl/Critic/Utils/POD.pm', 'SYNOPSIS'); my $see_also = get_pod_section_from_filehandle($file_handle, 'SEE ALSO'); my $see_also_content = trim_pod_section($see_also); # "Utility functions for dealing with POD." my $module_abstract = get_module_abstract_from_file('Perl/Critic/Utils/POD.pm'); my $module_abstract = get_module_abstract_from_filehandle($file_handle); DESCRIPTION
Provides means of accessing chunks of POD. INTERFACE SUPPORT
This is considered to be a public module. Any changes to its interface will go through a deprecation cycle. IMPORTABLE SUBROUTINES
"get_pod_file_for_module( $module_name )" Figure out where to find the POD for the parameter. This depends upon the module already being loaded; it will not find the path for arbitrary modules. If there is a file with a ".pod" extension next to the real module location, it will be returned in preference to the actual module. "get_raw_pod_section_from_file( $file_name, $section_name )" Retrieves the specified section of POD (i.e. something marked by "=head1") from the file. This is uninterpreted; escapes are not processed and any sub-sections will be present. E.g. if the content contains "C<$x>", the return value will contain "C<$x>". Returns nothing if no such section is found. Throws a Perl::Critic::Exception::IO if there's a problem with the file. "get_raw_pod_section_from_filehandle( $file_handle, $section_name )" Does the same as "get_raw_pod_section_from_file()", but with a file handle. "get_raw_pod_section_from_string( $source, $section_name )" Does the same as "get_raw_pod_section_from_file()", but with a string that contains the raw POD. "get_raw_pod_section_for_module( $module_name, $section_name )" Does the same as "get_raw_pod_section_from_file()", but with a module name. Throws a Perl::Critic::Exception::Generic if a file containing POD for the module can't be found. "get_pod_section_from_file( $file_name, $section_name )" Retrieves the specified section of POD (i.e. something marked by "=head1") from the file. This is interpreted into plain text. Returns nothing if no such section is found. Throws a Perl::Critic::Exception::IO if there's a problem with the file. "get_pod_section_from_filehandle( $file_handle, $section_name )" Does the same as "get_pod_section_from_file()", but with a file handle. "get_pod_section_from_string( $source, $section_name )" Does the same as "get_pod_section_from_file()", but with a string that contains the raw POD. "get_pod_section_for_module( $module_name, $section_name )" Does the same as "get_pod_section_from_file()", but with a module name. Throws a Perl::Critic::Exception::Generic if a file containing POD for the module can't be found. "trim_raw_pod_section( $pod_section )" Returns a copy of the parameter, with any starting "=item1 BLAH" removed and all leading and trailing whitespace (including newlines) removed after that. For example, using one of the "get_raw_pod_section_from_*" functions to get the "NAME" section of this module and then calling "trim_raw_pod_section()" on the result would give you "Perl::Critic::Utils::POD - Utility functions for dealing with POD.". "trim_pod_section( $pod_section )" Returns a copy of the parameter, with any starting line removed and leading blank lines and trailing whitespace (including newlines) removed after that. Note that only leading whitespace on the first real line of the section will remain. Since this cannot count upon a "=item1" marker, this is much less reliable than "trim_raw_pod_section()". "get_raw_module_abstract_from_file( $file_name )" Attempts to parse the "NAME" section of the specified file and get the abstract of the module from that. If it succeeds, it returns the abstract. If it fails, either because there is no "NAME" section or there is no abstract after the module name, returns nothing. If it looks like there's a malformed abstract, throws a Perl::Critic::Exception::Fatal::Generic. Example "well formed" "NAME" sections without abstracts: Some::Module Some::Other::Module - Example "NAME" sections that will result in an exception: Some::Bad::Module This has no hyphen. Some::Mean::Module -- This has double hyphens. Some::Nasty::Module - This one attempts to span multiple lines. "get_raw_module_abstract_from_filehandle( $file_handle )" Does the same as "get_raw_module_abstract_from_file()", but with a file handle. "get_raw_module_abstract_from_string( $source )" Does the same as "get_raw_module_abstract_from_file()", but with a string that contains the raw POD. "get_raw_module_abstract_for_module( $module_name )" Does the same as "get_raw_module_abstract_from_file()", but for a module name. "get_module_abstract_from_file( $file_name )" Does the same as "get_raw_module_abstract_from_file()", but with escapes interpreted. "get_module_abstract_from_filehandle( $file_handle )" Does the same as "get_module_abstract_from_file()", but with a file handle. "get_module_abstract_from_string( $source )" Does the same as "get_module_abstract_from_file()", but with a string that contains the raw POD. "get_module_abstract_for_module( $module_name )" Does the same as "get_module_abstract_from_file()", but for a module name. AUTHOR
Elliot Shank <perl@galumph.com> COPYRIGHT
Copyright (c) 2008-2011 Elliot Shank. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. The full text of this license can be found in the LICENSE file included with this module. perl v5.14.2 2012-06-07 Perl::Critic::Utils::POD(3pm)
All times are GMT -4. The time now is 02:45 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy