That's a nice solution from the point of view of the human who has to read it; it's simple and succinct. However, it's a massively inefficient algorithm. If you have a half-million random numbers out of a million, the number of comparisons will be on the order of 250 billion (500000^2). Yikes!
For kicks, I tried it on an old laptop (PII 350 MHz, 192 MB ram, 256 MB swap) and it blew up. Well, not quite, but, after 43 seconds, grep (GNU grep 2.5.1) was killed by the kernel after having consumed all ram and swap.
For comparison, a diff on both files took 3 seconds and change.
Regards,
Alister
---------- Post updated at 02:48 PM ---------- Previous update was at 02:38 PM ----------
The following should print the missing numbers within the sequence and beyond up to n:
Regards,
Alister
Last edited by alister; 03-01-2013 at 05:20 PM..
These 2 Users Gave Thanks to alister For This Post:
Yes, but diff tends to throw human-friendly artifacts and search around a bit, so for data and control, the old sort merge of time immemorial keeps the overhead minimized, and the pipes make it pipeline parallel, two sorts in final merge feeding comm feeding a sort in initial blocking is 4 way parallel.
Yes, but diff tends to throw human-friendly artifacts and search around a bit, so for data and control, the old sort merge of time immemorial keeps the overhead minimized, and the pipes make it pipeline parallel, two sorts in final merge feeding comm feeding a sort in initial blocking is 4 way parallel.
I mentioned diff only to demonstrate that the job could be handled more efficiently than the grep proposal. However, default diff output would be trivial to massage, and more efficient than comm and multiple sorts:
Regards,
Alister
---------- Post updated at 03:14 PM ---------- Previous update was at 03:05 PM ----------
On the ancient (single core, single threaded) laptop whose specs I mentioned in my first post in this thread:
67 seconds: comm-sort
11 seconds: diff-sed
03 seconds: awk (from post #8)
A multicore machine will no doubt close the gap, but nonetheless, all that sorting for a list of already sorted numbers is unnecessary work.
Well, sort/comm is robust. I consistently look there for 'set' problems, even if I have to number input lines to reassemble them. I agree that some of diff's output formats might be pretty easy to massage, but what if the data gets less trivial?
Well, sort/comm is robust. I consistently look there for 'set' problems, even if I have to number input lines to reassemble them. I agree that some of diff's output formats might be pretty easy to massage, but what if the data gets less trivial?
Did you test with 500K numbers ?
The times reported previously reflect a file with ~1M numbers. To be precise, from the sequence 1 .. 1000000, I removed 2 and 999,999.
I also tested with a file that had approximately 500,000 of 1,000,000 numbers. The missing numbers were removed "randomly" using:
Run time increased for the awk and diff-sed approaches. I assume it's due to increased i/o (from 2 lines of output to ~500,000) outweighing savings in reading the input file (which shrank from 1M to ~0.5M lines). Run time for comm-sort decreased. I assume that the sort overhead saved with the much smaller file outweighed the output i/o.
47 seconds: comm -23 <(sort complete) <(sort partial) | sort -n
17 seconds: diff partial complete | sed -n '/^> /s///p'
05 seconds: awk (from post #8)
I agree with you that comm/sort is robust. The behavior is more deterministic. However, I prefer an AWK one-liner for this task. It too is robust. And, at a minor cost in increased complexity, it's much more efficient.
. . . However, it's a massively inefficient algorithm. If you have a half-million random numbers out of a million, the number of comparisons will be on the order of 250 billion (500000^2). Yikes!
. . .
I posted that proposal because it was so easy to read and understand, and assuming the numbers to compare against were sparse. I had some collywobbles/gripes when imagining grep had to run through all numbers in the file for every single input line...
We have an existing script called "slots.sh" that prints a few numbers. I want to have another shell script that looks if there is any number from this output that is less than 10. If it does find a number that is less than 10, it then emails to me the output of "slot. sh", if it is equal to 10 or... (7 Replies)
Hello,
I am using the Sublime Plugin LogHighlight.
I can use RegEx there to highlight some lines in sublime.
Now I need to find every line, that has a number of above 25000.
the lines look like this:
smart_sdl.result: 8947
smart_sdm.result: 8947
smart_sdn.result: 25000
Currently I am... (3 Replies)
Hi,
I am confused on apache service. Please advice me
There is no apache directory in my linux box. Even I cant find out httpd.conf file also.
But If browse my Hostname in Internet Explor, I can able to apache service there.
Please How it is working ..
This file is available in... (1 Reply)
Hi guys
I need to find both negative and positive numbers from the following text file. And i also dont need 0.
0
8
-7
-2268
007
-07
-00
-0a0
0a0
-07a0
7a00
0a0
Can someone please give a regex to filter out the values in red. I tried a few things in awk but it didnt work... (9 Replies)
Hi,
I was trying to extract the last word with all numbers using awk. like the below example. I am looking for a pattern using awk.
desired result: (13 Replies)
Hi,
Suppose I have 3 disks c0t0d0, c0t0d1, and c0t0d2 all with the *same* VGID.
I then run:
# vgchgid /dev/rdsk/c0t0d0 /dev/rdsk/c0t0d1
(notice I *accidentally* skipped the third disk!)
How would I fix this so that all 3 disks have the same VGID again?
I'm looking for step-by-step... (7 Replies)
I writing my script and got stuck in this function. Can someone help me?
I need to extract out the numbers inside a string.
Ex:
INPUT -> OUTPUT
abcdef123 -> 123
abc123def -> 123
123abcdef -> 123
a123bc45d -> 123 45
abcdefghi -> -1
Thank you! (12 Replies)
suppose u have a file
23 33
44 66
55 88
Another file
49
34
49
90
So to find where these numbers lie between in the first file
output shud be while using second file and search for the first file
44 66
-
23 33
44 66
- (1 Reply)
linux fedora core2
:) i am trying to write a script to clear, date, pwd and tty a linux termnal or konsole.. when I test the tty against $0 i am, getting a premission denied on the terminal that I am trying to printf to.. I tried using an awk command, test condition, an if then fi clause, but... (6 Replies)