Sponsored Content
Top Forums Shell Programming and Scripting Remove lines that are subsets of other lines in File Post 302941908 by Chubler_XL on Wednesday 22nd of April 2015 04:47:24 PM
Old 04-22-2015
This solution keeps the lines with most IPs first although not optimal it should give fairly more concise answers than just processing file top to bottom.

I tried it on a data files with 170K lines and it took approx 3 seconds to process.

Code:
awk '
function have(ln,ip,num) {
  ret=1
  num=split(ln,ips)
  for(ip=1;ip<=num;ip++)
     if(!(ips[ip] in havelist)) {
         havelist[ips[ip]]
         unique++
         ret=0
      }
  return ret
}
{ L[NR]=$0;D[NF]=D[NF] " " NR; max=NF>max?NF:max }
END {
   for(c=max;c;c--)
      if(D[c]) {
         lns=split(D[c],v)
         for(i=1;i<=lns;i++)
           if(!have(L[v[i]])) print L[v[i]]
      }
   printf "Data file contains %'\''d unique IPs\n", unique > "/dev/stderr"
}' infile


For anyone who wants to test possible solutions I used the script to make my test file:

Code:
for ((i=0;i<170000;i++))
do
    printf "10.0.%d.%d" $((RANDOM%256)) $((RANDOM%256))
    ips=$((RANDOM%8))
    while ((--ips > 0))
    do
       printf " 10.0.%d.%d" $((RANDOM%256)) $((RANDOM%256))
    done
    printf "\n"
done


Last edited by Chubler_XL; 04-22-2015 at 05:53 PM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

remove lines from file

file: 1 xxxxxxx 2 xxx xxx 5 xxx xxx ... 180 xxxxxx 200 xxx how to remove any lines with the first number range 1-180 (9 Replies)
Discussion started by: bluemoon1
9 Replies

2. UNIX for Dummies Questions & Answers

vi to remove lines in file

All, I have a text file with several entries like below: personname personname.domain.com I know there is a way to use vi to remove only the personname.domain.com line. Can someone help? I believe that it involves /s/g/ something...I just can't remember the exact syntax. Thanks (2 Replies)
Discussion started by: kjbaumann
2 Replies

3. Shell Programming and Scripting

remove lines from file

Hi gurus, i'm trying to remove a number of lines from a large file using the following command: sed '1,5000d' oldfile > newfile Somehow the lines in the old file are not deleted... Am I doing this wrongly? Any suggestions? :confused: Thanks! :) wee (10 Replies)
Discussion started by: lweegp
10 Replies

4. Shell Programming and Scripting

remove : lines from file

A small question I have a test.txt file I have contents as: a:google b:yahoo : c:facebook : d:hotmail How do I remove the line with : my output should be a:google b:yahoo c:facebook d:hotmail (5 Replies)
Discussion started by: aronmelon
5 Replies

5. Shell Programming and Scripting

remove blank lines and merge lines in shell

Hi, I'm not a expert in shell programming, so i've come here to take help from u gurus. I'm trying to tailor a csv file that i got to make it work for the LOAD FROM command. I've a datatable csv of the below format - --in file format xx,xx,xx ,xx , , , , ,,xx, xxxx,, ,, xxx,... (11 Replies)
Discussion started by: dvah
11 Replies

6. Shell Programming and Scripting

Remove lines from file

Hey Gang- I have a list of servers. I want to exclude servers that begin with and end with certain characters. Is there an easy command to do this? Example wvm1234dev wvm1234pro uvm1122dev uvm1122bku uvm1344dev I want to exclude any lines that start with "wvm" OR "uvm" AND end... (7 Replies)
Discussion started by: idiotboy
7 Replies

7. UNIX for Dummies Questions & Answers

Want to remove all lines but not latest 50 lines from a file

Hi, I have a huge file which has Lacs of lines. File system got full. I want your guys help to suggest me a solution so that I can remove all lines from that file but not last 50,000 lines. I want solution which can remove lines from existing file so that I can have some space left with. (28 Replies)
Discussion started by: prashant2507198
28 Replies

8. Shell Programming and Scripting

Remove lines in file

I have a file that contains the following: Party_Id1;Party_id2;Party_id3; 1;2;3; 0 0 4;5;6; 0 7;8;9; How can I adjust the file so it looks like this: Party_Id1;Party_id2;Party_id3; 1;2;3; 4;5;6; 7;8;9; I Think the '0' is something like a carriage return, I don't know. But how... (2 Replies)
Discussion started by: katled
2 Replies

9. Shell Programming and Scripting

Two files, remove lines from second based on lines in first

I have two files, a keepout.txt and a database.csv. They're unsorted, but could be sorted. keepout: user1 buser3 anuser19 notheruser27 database: user1,2343,"information about",field,blah,34 user2,4231,"mo info",etc,stuff,43 notheruser27,4344,"hiya",thing,more thing,423... (4 Replies)
Discussion started by: esoffron
4 Replies

10. Shell Programming and Scripting

awk to remove lines that do not start with digit and combine line or lines

I have been searching and trying to come up with an awk that will perform the following on a converted text file (original is a pdf). 1. Since the first two lines are (begin with) text they are removed 2. if $1 is a number then all text is merged (combined) into one line until the next... (3 Replies)
Discussion started by: cmccabe
3 Replies
RANDOM(4)						     Linux Programmer's Manual							 RANDOM(4)

NAME
random, urandom - kernel random number source devices DESCRIPTION
The character special files /dev/random and /dev/urandom (present since Linux 1.3.30) provide an interface to the kernel's random number generator. File /dev/random has major device number 1 and minor device number 8. File /dev/urandom has major device number 1 and minor device number 9. The random number generator gathers environmental noise from device drivers and other sources into an entropy pool. The generator also keeps an estimate of the number of bit of the noise in the entropy pool. From this entropy pool random numbers are created. When read, the /dev/random device will only return random bytes within the estimated number of bits of noise in the entropy pool. /dev/random should be suitable for uses that need very high quality randomness such as one-time pad or key generation. When the entropy pool is empty, reads to /dev/random will block until additional environmental noise is gathered. When read, /dev/urandom device will return as many bytes as are requested. As a result, if there is not sufficient entropy in the entropy pool, the returned values are theoretically vulnerable to a cryptographic attack on the algorithms used by the driver. Knowledge of how to do this is not available in the current non-classified literature, but it is theoretically possible that such an attack may exist. If this is a concern in your application, use /dev/random instead. CONFIGURING
If your system does not have /dev/random and /dev/urandom created already, they can be created with the following commands: mknod -m 644 /dev/random c 1 8 mknod -m 644 /dev/urandom c 1 9 chown root:root /dev/random /dev/urandom When a Linux system starts up without much operator interaction, the entropy pool may be in a fairly predictable state. This reduces the actual amount of noise in the entropy pool below the estimate. In order to counteract this effect, it helps to carry entropy pool informa- tion across shut-downs and start-ups. To do this, add the following lines to an appropriate script which is run during the Linux system start-up sequence: echo "Initializing kernel random number generator..." # Initialize kernel random number generator with random seed # from last shut-down (or start-up) to this start-up. Load and # then save 512 bytes, which is the size of the entropy pool. if [ -f /var/random-seed ]; then cat /var/random-seed >/dev/urandom fi dd if=/dev/urandom of=/var/random-seed count=1 Also, add the following lines in an appropriate script which is run during the Linux system shutdown: # Carry a random seed from shut-down to start-up for the random # number generator. Save 512 bytes, which is the size of the # random number generator's entropy pool. echo "Saving random seed..." dd if=/dev/urandom of=/var/random-seed count=1 FILES
/dev/random /dev/urandom AUTHOR
The kernel's random number generator was written by Theodore Ts'o (tytso@athena.mit.edu). SEE ALSO
mknod (1) RFC 1750, "Randomness Recommendations for Security" Linux 1997-08-01 RANDOM(4)
All times are GMT -4. The time now is 06:34 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy