Can this awk statement be optimized? i ask because log.txt is a giant file with several hundred thousands of lines of records.
The first column in "log.txt" contains the file name.
The second column in "log.txt" contains the last known total number of lines for each file.
myscript.sh reads in the file "log.txt" and for each file it finds, it gets the line number from the second column. begins scanning the file from that line number and gets the number of times it finds the search term provided by the user.
The worst approach (performance wise ) is
This creates three child processes for each line in the input file, 300K lines means 1 million process creations.
awk, perl, and ruby can do almost anything with a single process creation because they have tools built in as part of the language.
Consider simply using a larger awk program.
Or..
So, if your awk commands could be accomplished with parameter substitution in bash or ksh, you would speed things up enormously.
Example:
is doing nothing more than getting a field from data like this:
where it appears that you want store "fap" in a variable.
Since you seem to have done this several times in code before, I'm hoping to get you past the problem.
Can you think of a way to get "fap" using one of bash's ${ } constructs, or maybe set
and use one of the bash commands to get "fap"?
This User Gave Thanks to jim mcnamara For This Post:
you are right jim. my original code is very far from being efficient. the command above can probably replace it with ease.
i cant seem to think in awk, even though i prefer that over any other languages. i'm more comfortable with bash but i really want awk to do this.
in the code i quoted, i'm interested in the second field, which is the line number. and i need to be able to store the values of the count of the search terms, the new total line number of the each file. and then send that information to another file.
thinking in bash, any one who knows anything about scripting can figure out exactly what im doing here very easily.
to translate this to an efficient awk program is where i need serious help.
Step 1 is to define "static" variables outside the loop. Step 2 is to use the full potential of the read command. Step 3 is to get both termcount and newfilelinecount in one stroke. Step 4 is to have the sole output at the end of the loop
This User Gave Thanks to MadeInGermany For This Post:
Yes.
Got few suggestions.
- How about minifying resources
- mod_expires
- Service workers setup
https://www.unix.com/attachments/web-programming/7709d1550557731-sneak-preview-new-unix-com-usercp-vuejs-demo-screenshot-png (8 Replies)
I have prepared a shell script to find the duplicates based on the part of filename and retain latest.
#!/bin/bash
if ; then
mkdir -p dup
fi
NOW=$(date +"%F-%H:%M:%S")
LOGFILE="purge_duplicate_log-$NOW.log"
LOGTIME=`date "+%Y-%m-%d %H:%M:%S"`
echo... (6 Replies)
Hi,
I have two files in the format listed below. I need to find out all values from field 12 to field 20 present in file 2 and list them in file3(format as file2)
File1 :
FEIN,CHRISTA... (2 Replies)
I am looking for suggestions on how I could possibly optimized that piece of code where most of the time is spend on this script. In a nutshell this is a script that creates an xml file(s) based on certain criteria that will be used by a movie jukebox.
Example of data:
$SORTEDTMP= it is a... (16 Replies)
Hi forum,
I'm administrating a workstation/server for my lab and I was wondering how to optimize OSX. I was wondering what unnecessary background tasks I could kick off the system so I free up as much memory and cpu power.
Other optimization tips are also welcome (HD parameters, memory... (2 Replies)
Hi All,
My first thread to this sub-forum and first thread of this sub-forum :)
Here it is,
Am trying to delete duplicates from a table retaining just 1 duplicate value out of the duplicate records
for example : from n records of a table out of which x are duplicates, I want to remove x... (15 Replies)
How would one go about optimizing this current .sh program so it works at a more minimal time. Such as is there a better way to count what I need than what I have done or better way to match patterns in the file? Thanks,
#declare variables to be used.
help=-1
count=0
JanCount=0
FebCount=0... (3 Replies)