Sponsored Content
Top Forums Shell Programming and Scripting Remove lines that are subsets of other lines in File Post 302941908 by Chubler_XL on Wednesday 22nd of April 2015 04:47:24 PM
Old 04-22-2015
This solution keeps the lines with most IPs first although not optimal it should give fairly more concise answers than just processing file top to bottom.

I tried it on a data files with 170K lines and it took approx 3 seconds to process.

Code:
awk '
function have(ln,ip,num) {
  ret=1
  num=split(ln,ips)
  for(ip=1;ip<=num;ip++)
     if(!(ips[ip] in havelist)) {
         havelist[ips[ip]]
         unique++
         ret=0
      }
  return ret
}
{ L[NR]=$0;D[NF]=D[NF] " " NR; max=NF>max?NF:max }
END {
   for(c=max;c;c--)
      if(D[c]) {
         lns=split(D[c],v)
         for(i=1;i<=lns;i++)
           if(!have(L[v[i]])) print L[v[i]]
      }
   printf "Data file contains %'\''d unique IPs\n", unique > "/dev/stderr"
}' infile


For anyone who wants to test possible solutions I used the script to make my test file:

Code:
for ((i=0;i<170000;i++))
do
    printf "10.0.%d.%d" $((RANDOM%256)) $((RANDOM%256))
    ips=$((RANDOM%8))
    while ((--ips > 0))
    do
       printf " 10.0.%d.%d" $((RANDOM%256)) $((RANDOM%256))
    done
    printf "\n"
done


Last edited by Chubler_XL; 04-22-2015 at 05:53 PM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

remove lines from file

file: 1 xxxxxxx 2 xxx xxx 5 xxx xxx ... 180 xxxxxx 200 xxx how to remove any lines with the first number range 1-180 (9 Replies)
Discussion started by: bluemoon1
9 Replies

2. UNIX for Dummies Questions & Answers

vi to remove lines in file

All, I have a text file with several entries like below: personname personname.domain.com I know there is a way to use vi to remove only the personname.domain.com line. Can someone help? I believe that it involves /s/g/ something...I just can't remember the exact syntax. Thanks (2 Replies)
Discussion started by: kjbaumann
2 Replies

3. Shell Programming and Scripting

remove lines from file

Hi gurus, i'm trying to remove a number of lines from a large file using the following command: sed '1,5000d' oldfile > newfile Somehow the lines in the old file are not deleted... Am I doing this wrongly? Any suggestions? :confused: Thanks! :) wee (10 Replies)
Discussion started by: lweegp
10 Replies

4. Shell Programming and Scripting

remove : lines from file

A small question I have a test.txt file I have contents as: a:google b:yahoo : c:facebook : d:hotmail How do I remove the line with : my output should be a:google b:yahoo c:facebook d:hotmail (5 Replies)
Discussion started by: aronmelon
5 Replies

5. Shell Programming and Scripting

remove blank lines and merge lines in shell

Hi, I'm not a expert in shell programming, so i've come here to take help from u gurus. I'm trying to tailor a csv file that i got to make it work for the LOAD FROM command. I've a datatable csv of the below format - --in file format xx,xx,xx ,xx , , , , ,,xx, xxxx,, ,, xxx,... (11 Replies)
Discussion started by: dvah
11 Replies

6. Shell Programming and Scripting

Remove lines from file

Hey Gang- I have a list of servers. I want to exclude servers that begin with and end with certain characters. Is there an easy command to do this? Example wvm1234dev wvm1234pro uvm1122dev uvm1122bku uvm1344dev I want to exclude any lines that start with "wvm" OR "uvm" AND end... (7 Replies)
Discussion started by: idiotboy
7 Replies

7. UNIX for Dummies Questions & Answers

Want to remove all lines but not latest 50 lines from a file

Hi, I have a huge file which has Lacs of lines. File system got full. I want your guys help to suggest me a solution so that I can remove all lines from that file but not last 50,000 lines. I want solution which can remove lines from existing file so that I can have some space left with. (28 Replies)
Discussion started by: prashant2507198
28 Replies

8. Shell Programming and Scripting

Remove lines in file

I have a file that contains the following: Party_Id1;Party_id2;Party_id3; 1;2;3; 0 0 4;5;6; 0 7;8;9; How can I adjust the file so it looks like this: Party_Id1;Party_id2;Party_id3; 1;2;3; 4;5;6; 7;8;9; I Think the '0' is something like a carriage return, I don't know. But how... (2 Replies)
Discussion started by: katled
2 Replies

9. Shell Programming and Scripting

Two files, remove lines from second based on lines in first

I have two files, a keepout.txt and a database.csv. They're unsorted, but could be sorted. keepout: user1 buser3 anuser19 notheruser27 database: user1,2343,"information about",field,blah,34 user2,4231,"mo info",etc,stuff,43 notheruser27,4344,"hiya",thing,more thing,423... (4 Replies)
Discussion started by: esoffron
4 Replies

10. Shell Programming and Scripting

awk to remove lines that do not start with digit and combine line or lines

I have been searching and trying to come up with an awk that will perform the following on a converted text file (original is a pdf). 1. Since the first two lines are (begin with) text they are removed 2. if $1 is a number then all text is merged (combined) into one line until the next... (3 Replies)
Discussion started by: cmccabe
3 Replies
Devel::GraphVizProf(3pm)				User Contributed Perl Documentation				  Devel::GraphVizProf(3pm)

NAME
Devel::GraphVizProf - per-line Perl profiler (with graph output) SYNOPSIS
perl -d:GraphVizProf test.pl > test.dot dot -Tpng test.dot > test.png DESCRIPTION
NOTE: This module is a hack of Devel::SmallProf by Ted Ashton. It has been modified by Leon Brocard to produce output for GraphViz, but otherwise the only thing I have done is change the name. I hope to get my patches put into the main Devel::SmallProf code eventually, or alternatively read the output of Devel::SmallProf. Anyway, the normal documentation, which you can probably ignore, follows. The Devel::GraphVizProf profiler is focused on the time taken for a program run on a line-by-line basis. It is intended to be as "small" in terms of impact on the speed and memory usage of the profiled program as possible and also in terms of being simple to use. Those statistics are placed in the file smallprof.out in the following format: <num> <time> <ctime> <line>:<text> where <num> is the number of times that the line was executed, <time> is the amount of "wall time" (time according the the clock on the wall vs. cpu time) spent executing it, <ctime> is the amount of cpu time expended on it and <line> and <text> are the line number and the actual text of the executed line (read from the file). The package uses the debugging hooks in Perl and thus needs the -d switch, so to profile test.pl, use the command: perl5 -d:GraphVizProf test.pl Once the script is done, the statistics in smallprof.out can be sorted to show which lines took the most time. The output can be sorted to find which lines take the longest, either with the sort command: sort -k 2nr,2 smallprof.out | less or a perl script: open(PROF,"smallprof.out"); @sorted = sort {(split(/s+/,$b))[2] <=> (split(/s+/,$a))[2]} <PROF>; close PROF; print join('',@sorted); NOTES
o The "wall time" readings come from Time::HiRes and are reasonably useful, at least on my system. The cpu times come from the 'times' built-in and the granularity is not necessarily as small as with the wall time. On some systems this column may be useful. On others it may not. o GraphVizProf does attempt to make up for its shortcomings by subtracting a small amount from each timing (null time compensation). This should help somewhat with the accuracy. o GraphVizProf depends on the Time::HiRes package to do its timings. It claims to require version 1.20, but may work with earlier versions, depending on your platform. OPTIONS
GraphVizProf has 3 variables which can be used during your script to affect what gets profiled. o If you do not wish to see lines which were never called, set the variable "$DB::drop_zeros = 1". With "drop_zeros" set, GraphVizProf can be used for basic coverage analysis. o To turn off profiling for a time, insert a "$DB::profile = 0" into your code (profiling may be turned back on with "$DB::profile = 1"). All of the time between profiling being turned off and back on again will be lumped together and reported on the "$DB::profile = 0" line. This can be used to summarize a subroutine call or a chunk of code. o To only profile code in a certain package, set the %DB::packages array. For example, to see only the code in packages "main" and "Test1", do this: %DB::packages = ( 'main' => 1, 'Test1' => 1 ); o These variables can be put in a file called .smallprof in the current directory. For example, a .smallprof containing $DB::drop_zeros = 1; $DB::profile = 0; will set GraphVizProf to not report lines which are never touched for any file profiled in that directory and will set profiling off initially (presumably to be turned on only for a small portion of code). INSTALLATION
Just the usual perl Makefile.PL make make test make install and should install fine via the CPAN module. BUGS
Subroutine calls are currently not under the control of %DB::packages. This should not be a great inconvenience in general. The handling of evals is bad news. This is due to Perl's handling of evals under the -d flag. For certain evals, caller() returns '(eval n)' for the filename and for others it doesn't. For some of those which it does, the array "@{'_<filename'}" contains the code of the eval. For others it doesn't. Sometime, when I've an extra tuit or two, I'll figure out why and how I can compensate for this. Comments, advice and questions are welcome. If you see inefficent stuff in this module and have a better way, please let me know. AUTHOR
Ted Ashton <ashted@southern.edu> GraphVizProf was developed from code originally posted to usenet by Philippe Verdret <philippe.verdret@sonovision-itep.fr>. Special thanks to Geoffrey Broadwell <habusan2@sprynet.com> for his assistance on the Win32 platform and to Philippe for his patient assistance in testing and debugging. Copyright (c) 1997 Ted Ashton This module is free software and can be redistributed and/or modified under the same terms as Perl itself. SEE ALSO
Devel::DProf, Time::HiRes. perl v5.14.2 2012-04-02 Devel::GraphVizProf(3pm)
All times are GMT -4. The time now is 10:52 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy