I timed my awk script (from post #5) against a midsized Gutenberg collection (5,354,620 lines of text in 203 documents, 20 directories) I have a phrase list of 3939 phrases.
Processing time: 1h 43min
The original script is still running (Over 17h now)
I timed my awk script (from post #5) against a midsized Gutenberg collection (5,354,620 lines of text in 203 documents, 20 directories) I have a phrase list of 3939 phrases.
Processing time: 1h 43min
The original script is still running (Over 17h now)
for some reason, i had some trouble running both those scripts. The only things i really need to change are the input/output files and path to directory, right?
the delete statement may also give you some issues on AIX, as I think it might be a GNU extension or only supported in later implementions of awk.
The delete statement is supported in POSIX awk, but not on arrays. So delete h is an extension, but this should work:
Quote:
This would mean that only words or phrases in between spaces get matched but there are undoubtedly case with , . ! ? ; : and at the beginning or end of a line... ,no?
---------- Post updated at 13:14 ---------- Previous update was at 07:13 ----------
Given the word matching abilities of grep I thought the best approach would be to optimize your script. I changed the following parts:
- replace all those finds with a single find and store the result in a variable called filelist.
- by running grep with the -l option and changing the environment variable IFS so that it only contains a linefeed, the sort and cut and the call to /dev/null were no longer needed.
- add -F flag to grep to switch off regex matching and use literal matching, it also ensures no unintended matches occur
This resulted in this script you could try:
Preliminary testing showed a factor 15 speed improvement, ymmv..
Last edited by Scrutinizer; 03-16-2012 at 09:24 AM..
AIX 5.3 by default only has 4K for argument expansion, which can result in Argument/Parameter list too long errors when processing quite short parameter strings.
I'm almost sure that xargs would be required in the above script, depending on the actual length of /path/to/files and the current value of the ncargs OS parameter.
This User Gave Thanks to Chubler_XL For This Post:
You can try this, it's limit is shell expansion and it doesn't handle well if the same search string appear two or more times in same file.
It that case it will give output like :
Hi All,
I have written a new script to check for DB space and size of dump log file before it can be imported into a Oracle DB.
I'm relatively new to shell scripting.
Please help me optimize this script further. (0 Replies)
Hi guys,
I feel a bit comfortable now doing bash scripting but I am worried that the way I do it is not optimized and I can do much better as to how I code.
e.g.
I have a whole line in a file from which I want to extract some values.
Right now what I am doing is :
STATE=`cat... (5 Replies)
Hello,
I'm wondering if there is a quicker way of doing this.
Here is my mv script.
d=/conversion/program/out
cd $d
ls $d > /home/tempuser/$$tmp
while read line ; do
a=`echo $line|cut -c1-5|sed "s/_//g"`
b=`echo $line|cut -c16-21`
if ;then mkdir... (13 Replies)
Pl help to me to write the below code in a simple way ...
i suupose to use this code 3 to 4 places in my makefile(gnu) ..
****************************************
@for i in $(LIST_A); do \
for j in $(LIST_B); do\
if ;then\
echo "Need to sign"\
echo "List A = $$i , List B =$$j"\
... (2 Replies)
#!/usr/bin/perl
use strict;
use warnings;
use Date::Manip;
my $date_converted = UnixDate(ParseDate("3 days ago"),"%e/%h/%Y");
open FILE,">$ARGV";
while(<DATA>){
my @tab_delimited_array = split(/\t/,$_);
$tab_delimited_array =~ s/^\ =~ s/^\-//;
my $converted_date =... (2 Replies)
Hi All,
There is a script (test.sh) which is taking more CPU usage. I am attaching the script in this thread.
Could anybody please help me out to optimize the script in a better way.
Thanks,
Gobinath (6 Replies)
Hi All ,
I am just a new bie in Unix/Linux .
With help of tips from 'here and there' , I just created a simple script to
1. declare one array and some global variables
2. read the schema names from user (user input) and want2proceed flag
3. if user want to proceed , keep reading user... (8 Replies)
Hi,
I have this following script below. Its searching a log file for 2 string and if found then write the strings to success.txt and If not found write strings to failed.txt . if one found and not other...then write found to success.txt and not found to failed.txt.
I want to optimize this... (3 Replies)