Remove lines that are subsets of other lines in File
Hello everyone,
Although it seems easy, I've been stuck with this problem for a moment now and I can't figure out a way to get it done.
My problem is the following:
I have a file where each line is a sequence of IP addresses, example :
What I'd like to do, is to remove lines that are completely matched in other lines. In the previous example, "Line 1" would be deleted as it is contained in "Line 2".
So far, I've worked with python and set() objects to get the job done but I've got more than 100K lines and sets lookups are becoming time consuming as the program goes :/
Thanks for you help
Moderator's Comments:
Use code tags, thanks.
Last edited by zaxxon; 04-22-2015 at 06:31 AM..
Reason: code tags and missing a dot
All,
I have a text file with several entries like below:
personname
personname.domain.com
I know there is a way to use vi to remove only the personname.domain.com line. Can someone help? I believe that it involves /s/g/ something...I just can't remember the exact syntax.
Thanks (2 Replies)
Hi gurus,
i'm trying to remove a number of lines from a large file using the following command:
sed '1,5000d' oldfile > newfile
Somehow the lines in the old file are not deleted...
Am I doing this wrongly? Any suggestions? :confused:
Thanks! :)
wee (10 Replies)
A small question
I have a test.txt file
I have contents as:
a:google
b:yahoo
:
c:facebook
:
d:hotmail
How do I remove the line with :
my output should be
a:google
b:yahoo
c:facebook
d:hotmail (5 Replies)
Hi,
I'm not a expert in shell programming, so i've come here to take help from u gurus.
I'm trying to tailor a csv file that i got to make it work for the LOAD FROM command.
I've a datatable csv of the below format -
--in file format
xx,xx,xx ,xx , , , , ,,xx,
xxxx,, ,, xxx,... (11 Replies)
Hey Gang-
I have a list of servers. I want to exclude servers that begin with and end with certain characters. Is there an easy command to do this?
Example
wvm1234dev
wvm1234pro
uvm1122dev
uvm1122bku
uvm1344dev
I want to exclude any lines that start with "wvm" OR "uvm" AND end... (7 Replies)
Hi,
I have a huge file which has Lacs of lines. File system got full.
I want your guys help to suggest me a solution so that I can remove all lines from that file but not last 50,000 lines. I want solution which can remove lines from existing file so that I can have some space left with. (28 Replies)
I have a file that contains the following:
Party_Id1;Party_id2;Party_id3;
1;2;3;
0
0
4;5;6;
0
7;8;9;
How can I adjust the file so it looks like this:
Party_Id1;Party_id2;Party_id3;
1;2;3;
4;5;6;
7;8;9;
I Think the '0' is something like a carriage return, I don't know. But how... (2 Replies)
I have two files, a keepout.txt and a database.csv. They're unsorted, but could be sorted.
keepout:
user1
buser3
anuser19
notheruser27
database:
user1,2343,"information about",field,blah,34
user2,4231,"mo info",etc,stuff,43
notheruser27,4344,"hiya",thing,more thing,423... (4 Replies)
I have been searching and trying to come up with an awk that will perform the following on a
converted text file (original is a pdf).
1. Since the first two lines are (begin with) text they are removed
2. if $1 is a number then all text is merged (combined) into one line until the next... (3 Replies)
Perl::Critic::Policy::InputOutput::RequireBriefOpen(3pm)User Contributed Perl DocumentatioPerl::Critic::Policy::InputOutput::RequireBriefOpen(3pm)NAME
Perl::Critic::Policy::InputOutput::RequireBriefOpen - Close filehandles as soon as possible after opening them.
AFFILIATION
This Policy is part of the core Perl::Critic distribution.
DESCRIPTION
One way that production systems fail unexpectedly is by running out of filehandles. Filehandles are a finite resource on every operating
system that I'm aware of, and running out of them is virtually impossible to recover from. The solution is to not run out in the first
place. What causes programs to run out of filehandles? Usually, it's leaks: you open a filehandle and forget to close it, or just wait a
really long time before closing it.
This problem is rarely exposed by test systems, because the tests rarely run long enough or have enough load to hit the filehandle limit.
So, the best way to avoid the problem is 1) always close all filehandles that you open and 2) close them as soon as is practical.
This policy takes note of calls to "open()" where there is no matching "close()" call within "N" lines of code. If you really need to do a
lot of processing on an open filehandle, then you can move that processing to another method like this:
sub process_data_file {
my ($self, $filename) = @_;
open my $fh, '<', $filename
or croak 'Failed to read datafile ' . $filename . '; ' . $OS_ERROR;
$self->_parse_input_data($fh);
close $fh;
return;
}
sub _parse_input_data {
my ($self, $fh) = @_;
while (my $line = <$fh>) {
...
}
return;
}
As a special case, this policy also allows code to return the filehandle after the "open" instead of closing it. Just like the close,
however, that "return" has to be within the right number of lines. From there, you're on your own to figure out whether the code is
promptly closing the filehandle.
The STDIN, STDOUT, and STDERR handles are exempt from this policy.
CONFIGURATION
This policy allows "close()" invocations to be up to "N" lines after their corresponding "open()" calls, where "N" defaults to 9. You can
override this to set it to a different number with the "lines" setting. To do this, put entries in a .perlcriticrc file like this:
[InputOutput::RequireBriefOpen]
lines = 5
CAVEATS
"IO::File->new"
This policy only looks for explicit "open" calls. It does not detect calls to "CORE::open" or "IO::File->new" or the like.
Is it the right lexical?
We don't currently check for redeclared filehandles. So the following code is false negative, for example, because the outer scoped
filehandle is not closed:
open my $fh, '<', $file1 or croak;
if (open my $fh, '<', $file2) {
print <$fh>;
close $fh;
}
This is a contrived example, but it isn't uncommon for people to use $fh for the name of the filehandle every time. Perhaps it's time to
think of better variable names...
CREDITS
Initial development of this policy was supported by a grant from the Perl Foundation.
AUTHOR
Chris Dolan <cdolan@cpan.org>
COPYRIGHT
Copyright (c) 2007-2011 Chris Dolan. Many rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. The full text of this license
can be found in the LICENSE file included with this module
perl v5.14.2 2012-06-07 Perl::Critic::Policy::InputOutput::RequireBriefOpen(3pm)