Alternative for wc -l


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Alternative for wc -l
# 8  
Old 10-15-2010
wc -l testfile0m0.488s
wc -l < testfile0m0.493s
gawk 'END{printf ("%d\n", NR)}' testfile0m5.220s
mawk 'END{printf ("%d\n", NR)}' testfile0m2.133s
sed -n '$=' testfile0m14.833s
grep -c \$ testfile0m29.044s
wcc testfile0m4.932s
fwcl < testfile0m1.202s

Last edited by Scrutinizer; 10-15-2010 at 06:40 PM..
These 2 Users Gave Thanks to Scrutinizer For This Post:
# 9  
Old 10-15-2010
Lol ok, forget my program Smilie
# 10  
Old 10-15-2010
Lines can occur at any byte, so brute force is it, and a hybrid approach is needed:
  1. capture the nominal file size using ls -l or the like.
  2. use head -c size <file |wc -l to find the line count to that byte count, even if the file has grown since ls -l.
  3. Report the total.
  4. Save both.
  5. Next time, capture the nominal file size using ls -l or the like.
  6. Calculate the size delta.
  7. use tail -c +old_size <file | head -c delta | wc -l to count just the new lines to that byte count, even if the file has grown since ls -l.
  8. Add the new lines to the past lines.
  9. report the total.
  10. Save the new size and total lines for the next time.

Seeks by byte count are fast.
# 11  
Old 10-15-2010
Yes, you can usually count on a purpose-built utility being faster than something bodged in a shell or string language! Smilie
# 12  
Old 10-15-2010
The only compromise in "wc -l" is the probable FILE* i/o, which could be rewritten to do raw read() or, even faster and more dangerous to system throughput, mmap64() using a 64 bit compiler (flushes all RAM to backing store). It is still brute force.

That is why I mentioned the 'storage of earlier byte counts' method. You could have it run all day, periodically updating the byte total report.

If the app logged as it wrote, every N lines or N seconds, which ever came first, then you could just tail the log.

If it is not our code, you could even write a pass-through logger and use a (possibly named) pipe to get access to the output stream, if the app can be configured for output to a (possibly named) pipe.

---------- Post updated at 05:02 PM ---------- Previous update was at 03:42 PM ----------

Not the weepy face! Try this fast wc -l stdin C bit using read() and quarter meg buffer:

Code:
#include <stdio.h>
#include <errno.h>

int main(){

        char buf[262144];
        long long ct = 0L ;
        char *cp ;
        int ret ;


        while ( 0 < ( ret = read( 0, buf, sizeof(buf)))
         || ( ret == -1
           && ( errno == EAGAIN
             || errno == EINTR ))){

                if ( ret < 0 )
                        continue ;

                for ( cp = buf + ret - 1 ; cp >= buf ; cp-- ){
                        if ( *cp == '\n' ){
                                ct++ ;
                         }
                 }
         }

        if ( ret ){
                perror( "fwcl: stdin" );
                exit( 1 );
         }

        printf( "%lld\n", ct );

        exit( 0 );
 }

"mysrc/fwcl.c" line 37 of 37 --100%-- 

$ wc -l <.profile
70
$ fwcl <.profile
70
$

# 13  
Old 10-16-2010
I added fwcl to the table .. (post8) ymmv.

---------- Post updated 16-10-10 at 08:11 ---------- Previous update was 15-10-10 at 23:43 ----------

I ran these tests again more seriously with a considerably larger file:

wc -l testfile0m53.176s
wc -l < testfile0m51.629s
gawk 'END{printf ("%d\n", NR)}' testfile8m44.512s
mawk 'END{printf ("%d\n", NR)}' testfile3m32.646s
sed -n '$=' testfile35m21.385s
grep -c \$ testfile71m33.431s
wcc testfile10m55.281s
fwcl < testfile0m35.315s

This time DGPickett's C bit is the clear winner.

Last edited by Scrutinizer; 10-16-2010 at 01:24 PM..
# 14  
Old 10-16-2010
Hi.

If one can be satisfied with an estimate, then a code that samples the file can be very fast.

As DGPickett said, seeks are fast. This demo code, esmele, reads the first 100 lines of the file (almost a GB), and skips to 6000 characters before the EOF, reading again (88 lines in this situation). The mean lengths are calculated and then the estimate is made based on another quickly-accessible characteristic, the length of the file via stat. The accuracy compared to wc is within 2%. The time is (essentially) constant, although if one were to choose to read percentages of the file, say 3% at the beginning, middle, and end, one could be more accurate, at the expense of taking more time.

Code:
% ./compare-esmele-wc 

-----
 File characteristics:
-rw-r--r-- 1 955M Oct 16 05:54 /tmp/test-one-gb

-----
 Time and result of esmele on /tmp/test-one-gb:

real	0m0.011s
user	0m0.008s
sys	0m0.004s
14958698

-----
 Time and result of wc on /tmp/test-one-gb:

real	0m2.739s
user	0m1.212s
sys	0m0.480s
14754910

-----
 Ratio of wc / es counts:
0.986377

Best wishes ... cheers, drl
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Looking for an alternative to Tcl

I've created quite a collection of tcl scripts which have buttons, radio buttons, check boxes, text fields, etc. These tcl scripts in turn call and execute several hundred sh, csh, bash, perl scripts and pass in the args based on the gui selections on the same and other redhat machines. We're... (4 Replies)
Discussion started by: scottwevans
4 Replies

2. Solaris

vi alternative

Is there any other editor, installed by 'default' in Sparc Solaris10, besides vi? I'd like to avoid installing anything new. If not, how to make vi more user-friendly? thanks. (8 Replies)
Discussion started by: orange47
8 Replies

3. Solaris

Alternative to sshfs?

I have an automated testing script that relies on the dev box being able to see production's (NFS) share. It uses rsync and ssh to handle transfers and command execution; however, it also needs the production share mounted in order to run Perl code against it when Unix commands via ssh will not do.... (2 Replies)
Discussion started by: effigy
2 Replies

4. Shell Programming and Scripting

Alternative for ikecert

Hi Folks... Is there an alternative for ikecert(SunOS) - man info - "manipulates the machine's on-filesystem public-key certificate databases" in linux? Can we use pkcs7, pkcs8 or something like that?... I also came across ssh-keygen and ssh-keygen2... My best guess is to use ssh-certtool... (0 Replies)
Discussion started by: ahamed101
0 Replies

5. HP-UX

alternative for egrep -o on HP-UX

Hello to all board members!! I have a problem on a HP-UX system. I should write a script. Therefore I need to search after IP addresses in the output of a command. On Debian this works: ifconfig | egrep -o "{1,3}\.{1,3}\.{1,3}\.{1,3}" The script where i need this is not ifconfig, but... (2 Replies)
Discussion started by: vostro
2 Replies

6. Shell Programming and Scripting

du alternative in perl

I have a perl script that just does a `du -sk -x` and formats it to look groovy ( the argument can be a directory but usually is like /usr/local/* ) #!/usr/bin/perl use strict; use warnings; my $sizes = `du -x -sk @ARGV | sort -n`; my $total = 0; print "MegaBytes Name\n"; for(split... (1 Reply)
Discussion started by: insania
1 Replies

7. Shell Programming and Scripting

help with while loop or any other alternative?

i=1 while do mm=02 dd=03 yy=2008 echo "$mm$dd$yy" i=$(( i+1)) echo "$i" done whenever i execute the script above i will get the error below: syntax error at line 30: `i=$' unexpected (3 Replies)
Discussion started by: filthymonk
3 Replies

8. Shell Programming and Scripting

getopts alternative?

I have to implement switches (options) like this in my script. ./myscript -help ./myscript -dir /home/krish -all ./myscript -all getopts allows switches to have one character (like a, b, etc.). How can I customize it for handling the above situation? Or, is there any alternative to... (3 Replies)
Discussion started by: krishmaths
3 Replies
Login or Register to Ask a Question