i'm looking for a awk command that is efficient. if i smack something together, i'm pretty sure it wont be efficient. so i'm hoping someone has a better way of doing this:
grep should perform way better than awk in pattern matching.
You must be using a very good grep and a very bad awk if you see a significant difference in simply printing matching lines.
Not only do I not see a big difference, but awk wins one of the tests.
Using a pair of GNU implementations (neither of which is renowned for speed):
Code:
$ awk --version | head -n1
GNU Awk 4.1.0, API: 1.0 (GNU MPFR 3.1.2, GNU MP 4.3.2)
$ grep --version | head -n1
GNU grep 2.6.3
Fixed string:
Code:
$ time seq 500000 | grep -c 434
2484
real 0m15.266s
user 0m14.685s
sys 0m0.061s
$ time seq 500000 | grep -Fc 434
2484
real 0m15.266s
user 0m14.919s
sys 0m0.015s
$ time seq 500000 | awk '/434/ {++i} END {print i}'
2484
real 0m14.813s
user 0m14.888s
sys 0m0.030s
Regular expression with wildcard:
Code:
$ time seq 500000 | grep -c '4.*4'
73535
real 0m14.844s
user 0m15.968s
sys 0m0.015s
$ time seq 500000 | awk '/4.*4/ {++i} END {print i}'
73535
real 0m15.047s
user 0m14.998s
sys 0m0.076s
As you want the line count per file, you need to read every file entirely; I don't see much chance to improve on speed...
thank you!!!
this worked perfectly. is there anyway i can instruct awk to do exactly what you're doing here, but to treat any file it finds that isn't plain text (i.e. gzip files) in a different way?
like, for instance, the grepping for the string wont work on files that are gzipped. i do know you can use the following for reading gzip files:
the problem im having is being able to incorporate this command into your awk command so it is kicked off ONLY when the awk comes across a file that isn't plain text.
Why don't you gunzip all files upfront and then apply the awk script to the entire directory?
actually that's the least of my problems now. i believe i'll be able to figure that out at the end. but the only other question i have is, lets say the first time i run this command, i get and output similar to this:
Code:
first run:
/data/projects/file01,300lines,130lines matching 'Customer.*Processed'
(note, this is just one file out of many that would be in the output.)
now, the above output is saved to a file called /tmp/results.txt
the second time i run this command, say 5 minutes later, there'd be a line in the output similar to:
Code:
second run:
/data/projects/file01,410lines,139lines matching 'Customer.*Processed'
now, i dont want to search through each file again. i want to begin from the point where the last scan left off.
in the first run, there were 300 lines in the file named '/data/projects/file01. I want it so that, the next time i run the script, awk can begin from line 301 to the end of the file. and i want to have this happen for all the files it finds in the directory. this way, only the first run will be slow. all runs after that will be fast.
if while comparing the most recent list of files in the latest scan, it finds a file that didn't exist in the previous scan, it'll scan that file in its entirety because it would be considered new.
Hi!
I'm new in awk and I need some help.
I have a folder with a lot of files and I need that awk do something in each file and print a new file with the output. The input file name should be modified when I print the outpu files.
Thanks in advance for help!
:-)
ciao (5 Replies)
Hi,
Is it possible to have multiple files with the same name in a same unix directory?
Eg., in the path \tmp, can we have 2 files with the same name as SALES_data_20120124.TXT?
I heard it is possible if the user id that is created the files are different and with some ids, a new gets... (1 Reply)
Hi,
I have a directory /home/datasets/ which contains a bunch (720) of subdirectories called hour_1/ hour_2/ etc..etc.. in each of these there is a single text file called (hour_1.txt in hour_1/ , hour_2.txt for hour_2/ etc..etc..) and i would like to do some text processing in them.
Each of... (20 Replies)
Hi,
I have a bunch of records within a directory where each one has this form:
(example file1)
1 2 50 90 80 90 43512 98 0909 79869 -9 7878 33222 8787 9090 89898 7878 8989 7878 6767 89 89 78676 9898 000 7878 5656 5454 5454
and i want for all of these files to be... (3 Replies)
Hi,
I'd like to process multiple files. For example:
file1.txt
file2.txt
file3.txt
Each file contains several lines of data. I want to extract a piece of data and output it to a new file.
file1.txt ----> newfile1.txt
file2.txt ----> newfile2.txt
file3.txt ----> newfile3.txt
Here is... (3 Replies)
Hi guys,
say I have a few files in a directory (58 text files or somthing)
each one contains mulitple strings that I wish to replace with other strings
so in these 58 files I'm looking for say the following strings:
JAM (replace with BUTTER)
BREAD (replace with CRACKER)
SCOOP (replace... (19 Replies)
Hi everyone!!
I have a database table, which has file_name as one of its fields.
Example:
File_ID File_Name Directory Size
0001 UNO_1232 /apps/opt 234
0002 UNO_1234 /apps/opt 788
0003 UNO_1235 /apps/opt 897
0004 UNO_1236 /apps/opt 568
I have to... (3 Replies)
Hello,
I am trying to write a bash shell script that does the following:
1.Finds all *.txt files within my directory of interest
2. reads each of the files (25 files) one by one (tab-delimited format and have the same data format)
3. skips the first 10 rows of the file
4. extracts and... (4 Replies)
Hi,
filenames:
contains name of list of files to search in.
placelist
contains the names of places to be searched in all files in "filenames"
for i in $(<filenames)
do
egrep -f placelist $i
if ]
then
echo $i
fi
done >> outputfile
Output i am getting: (0 Replies)
Hi there Gurus,
I have the following ftp script:
$ more ftp_dump_arch4.sh
#! /usr/bin/ksh
# Constant variables
HOST='xx.xx.xx.xx'
USER='user'
PASSWD='password'
dir='/export/file'
ftp_log='/tmp'
ftp -n $HOST > $ftp_log/ftp.log << END
user $USER $PASSWD
verbose
lcd $dir
bin (3 Replies)