Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris
Posts: 2,288
Thanks Given: 430
Thanked 480 Times in 395 Posts
Hi, Scrutinizer.
Quote:
Originally Posted by Scrutinizer
@drl, grep cannot do this and I do not think cgrep is present on Solaris, is it? cgrep looks nice though and it is fast indeed. I presume cgrep was tested against gawk, which is one of the slowest awks. Perhaps you could compare it to the fastest awk, which is mawk..
I have only the old Solaris-X86 running in a VM:
Code:
OS, ker|rel, machine: SunOS, 5.10, i86pc
Distribution : Solaris 10 10/08 s10x_u6wos_07b X86
There are a number of repos which may have it, but I have not searched extensively. I can try to see if cgrep will compile on Solaris (it was an easy make on Linux, both 32-and-64-bit), but that will be a low-priority task.
An excerpt from a searching benchmark on a 100MB file shows:
% ./s2
Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution : Debian GNU/Linux 5.0.8 (lenny)
bash GNU bash 3.2.39
cgrep ATT cgrep 8.15
gawk GNU Awk 3.1.5
mawk 1.3.3 Nov 1996, Copyright (C) Michael D. Brennan
-----
Input file /tmp/100-mb.txt is 1,777,700 lines, 120,540,400 characters:
Edges: 5:0:5 of 1777700 lines in file "/tmp/100-mb.txt"
Preliminary Matter.
This text of Melville's Moby-Dick is based on the Hendricks House edition.
It was prepared by Professor Eugene F. Irey at the University of Colorado.
Any subsequent copies of this data must include this notice
---
AND FLOATED BY MY SIDE. +BUOYED UP BY THAT COFFIN, FOR ALMOST ONE WHOLE DAY
AND NIGHT, +I FLOATED ON A SOFT AND DIRGE-LIKE MAIN. +THE UNHARMING SHARKS,
THEY GLIDED BY AS IF WITH PADLOCKS ON THEIR MOUTHS; THE SAVAGE SEA-HAWKS SAILE
D WITH SHEATHED BEAKS. +ON THE SECOND DAY, A SAIL DREW NEAR, NEARER, AND PIC
KED ME UP AT LAST. +IT WAS THE DEVIOUS-CRUISING +RACHEL, THAT IN HER RETRACIN
-----
Results for cgrep:
real 0m0.224s
user 0m0.104s
sys 0m0.100s
-----
Results for gawk:
real 0m1.453s
user 0m1.328s
sys 0m0.092s
-----
Results for mawk:
real 0m1.105s
user 0m0.988s
sys 0m0.096s
If there is something that takes a cache hit, it would be the wc, or at least the cgrep ... cheers, drl
With an input file, similar to your Moby Dick and not directly related to the problem at hand in this thread (and with which there were no matches) I also get a factor 5 difference between gawk and mawk, so your result may be a compile thing?. The difference between cgrep and mawk is a factor 6.
With an input file that is a large version of the input file of the problem in this thread, mawk and cgrep are about the same speed, with mawk being 5-10% faster than cgrep, while the difference between mawk and gawk was still a factor 5 - 5.5
% ./s2
Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution : Debian GNU/Linux 5.0.8 (lenny)
bash GNU bash 3.2.39
cgrep ATT cgrep 8.15
gawk GNU Awk 3.1.5
mawk 1.3.3 Nov 1996, Copyright (C) Michael D. Brennan
-----
Input file /tmp/100-mb.txt is 1,777,700 lines, 120,540,400 characters:
Edges: 5:0:5 of 1777700 lines in file "/tmp/100-mb.txt"
Preliminary Matter.
This text of Melville's Moby-Dick is based on the Hendricks House edition.
It was prepared by Professor Eugene F. Irey at the University of Colorado.
Any subsequent copies of this data must include this notice
---
AND FLOATED BY MY SIDE. +BUOYED UP BY THAT COFFIN, FOR ALMOST ONE WHOLE DAY
AND NIGHT, +I FLOATED ON A SOFT AND DIRGE-LIKE MAIN. +THE UNHARMING SHARKS,
THEY GLIDED BY AS IF WITH PADLOCKS ON THEIR MOUTHS; THE SAVAGE SEA-HAWKS SAILE
D WITH SHEATHED BEAKS. +ON THE SECOND DAY, A SAIL DREW NEAR, NEARER, AND PIC
KED ME UP AT LAST. +IT WAS THE DEVIOUS-CRUISING +RACHEL, THAT IN HER RETRACIN
-----
Results for cgrep:
real 0m0.224s
user 0m0.104s
sys 0m0.100s
-----
Results for gawk:
real 0m1.453s
user 0m1.328s
sys 0m0.092s
-----
Results for mawk:
real 0m1.105s
user 0m0.988s
sys 0m0.096s
If there is something that takes a cache hit, it would be the wc, or at least the cgrep ... cheers, drl
Hello,
Thank you very much for your effort, looks like very good craftsmanship, unfortunately I cannot test anyware as I don;t have cgrep on any of my machines.
I have a large dataset with following structure;
C 0001 Carbon
D SAR001 methane
D SAR002 ethane
D SAR003 propane
D SAR004 butane
D SAR005 pentane
C 0002 Hydrogen
C 0003 Nitrogen
C 0004 Oxygen
D SAR011 ozone
D SAR012 super oxide
C 0005 Sulphur
D SAR013... (3 Replies)
I have a file lake this
cat ex1.txt
</DISCOUNTS>
<B2B_SPECIFICATION elem="0">
<B2B_SPECIFICATION elem="0">
<DESCR>Netti 2 </DESCR>
<NUMBER>D02021507505</NUMBER>
</B2B_SPECIFICATION>
<B2B_SPECIFICATION elem="1">
<DESCR>Puhepaketti</DESCR>... (2 Replies)
This is a variation of an earlier post found here:
unixcom/shell-programming-scripting/159821-merge-two-non-consecutive-lines.html
User Bartus11 was kind enough to solve that example.
Previously, I needed help combining two lines that are non-consecutive in a file. Now I need to do the... (7 Replies)
I have several very large file that are extracts from Oracle tables. These files are formatted in XML type syntax with multiple entries like:
<ROW>
some information
more information
</ROW>
I want to grep for some words, then print all lines between <ROW> AND </ROW>. Can this be done with AWK?... (7 Replies)
i need to grep a STRING_A & the next few lines after the STRING_A
example file:
STRING_A yada yada
line 1
line 2
STRING_B yada yada
line 1
line 2
line 3
STRING_A yada yada
line 1
line 2
line 3
line 4
STRING_A yada yada
line 1
line 2
line 3
line 4 (7 Replies)
Hi experts,
I want to grep a number 9366109380 from a file but it will also show me the next 5 lines. Below is the example-
when i grep 989366109380, i can also see the next 5 lines.
Line 1. <fullOperation>MAKE:NUMBER:9366109380:PPAY2;</fullOperation>
Line 2.... (10 Replies)
need help on this. let say i hv 1 file contains as below:
STRING
Description bla bla bla
Description yada yada yada
Data bla bla
Data yada yada
how do i want to display n lines after the string?
thanks in advance! (8 Replies)