thanks for not inserting the files into this forum because it is too much.
Futhermore it's annoying for me needing to download anything, install an unpacking program prior to reading the files. I politely ask you if you please use a pasting service. Maybe this one:
To expand on what stomp already said, your script seems to read the input file n+1 times, once to detect the count of reports ( = n) and their respective location in it (one sed and one awk invocation), then using a shell loop to "extract" and analyse the respective single reports, invoking awk 13 times, and sed 4 times per loop (i.e. 130 awks and 40 seds for the sample file with 10 reports).
No surprise this will be somewhat lengthy on large files with many reports...
Sometimes I have files with more of 20000 lines in this case it takes long time to end the job.
Kindly, could you please help me to improve it.
Appreciate your help.
I've not read the whole thing, but assuming that you are working through the file, line by line and this design is actually reading the whole file each and every time round your loop, calling 10 processes to split up your line, how about this construct instead:-
Code:
while read tap rec lin pnt spx spy spz tim lfr lto unused
do
whatever_you_need_here
done < info_records.list
Okay, so this is sh / ksh / bash, but it is far neater and has far lower overheads. I've not got a true csh available and the manual page I have uses all sorts of bash phrasing so it is only imitating some csh scripting so I can't really test anything I write in csh.
You might be better with this csh mangle instead of calling awk all over the place:-
Code:
foreach line ( "`cat input_file`" )
set parsed = ($line)
set tap = $parsed[1]
set rec = $parsed[2]
set lin = $parsed[3]
:
:
end
This reads the file once, and for each line it splits up the record to the variables you want without calling external commands (except the cat that I can't find a way to remove)
Overall though, it is worth the effort to convert to sh based scripts. I hope your code is not riddled with goto statements like I have had to decipher before. That poor programming can leave serious headaches in re-designing.
I hope that this helps,
Robin
These 2 Users Gave Thanks to rbatte1 For This Post:
You might want to try this one. Due to the input file structure, it must be read twice - once to identify the respective reports, another time to extract the data and produce the output files.
As you can see below, the X output exactly matches your sample output. The S file doesn't as I don't understand your date/time function and thus can't replicate it. Nor more shell loops, no sed, and just two awk invocations, I'd guess it should save serious amounts of time. Please report back.
Code:
awk -F: '
BEGIN {FMT = "%d %d %d %d %11.1f %11.1f %11.1f %s %010d %010d\n"
for (n = split ("Tape_Nb:File_Nb:Line_Name:Point_Number:Cog_Easting:Cog_Northing:Cog_Elevation:Tb_GPS_Time", IX); n>0; n--) SRCH[IX[n]]
}
$1 == "Observer_Report " {if (flag) printf FMT, OUT[IX[1]], OUT[IX[2]], OUT[IX[3]], OUT[IX[4]],
OUT[IX[5]], OUT[IX[6]], OUT[IX[7]], OUT[IX[8]], from, to
delete OUT
from = NR
flag = 1
}
{gsub (/[ ]/, _)
to = NR
}
$1 in SRCH {OUT[$1] = $2
if ($1 ~ /Tb_GPS_Time/) OUT[$1] = substr($2,2,16)
}
END {printf FMT, OUT[IX[1]], OUT[IX[2]], OUT[IX[3]], OUT[IX[4]],
OUT[IX[5]], OUT[IX[6]], OUT[IX[7]], OUT[IX[8]], from, to
}
' /tmp/16.txt |
awk -F[:-\(] '
BEGIN {HD1 = "H26 5678901234567890123456789012345678901234567890123456789012345678901234567890"
HD2 = "H26 1 2 3 4 5 6 7 "
}
NR == 1 {print HD1 RS HD2 > XFILE
print HD1 RS HD2 > SFILE
}
FNR == NR {OR[NR] = $0
MX = NR
next
}
FNR > NXTREP ||
FNR == 1 {n = split (OR[++OCNT], T, " ")
NXTREP = T[n] + 0
printf "S%10.2f%10.2f%3d1 %9.1f%10.1f%6.1f%09d\n", T[3], T[4], 1, T[5], T[6], T[7], T[8] > SFILE
}
{sub (/^[ ]*/, _)
sub (/ *: */, ":")
}
$1 ~ /^Live_Seis/ {DATA = 1
sub (/Live_Seis[^:]*:/, _)
}
/[^0-9:() -]/ {DATA = 0
}
DATA {printf "X%6d%8d11%10.2f%10.2f%1d%5d%5d1%10.2f%10.2f%10.2f1\n", T[1], T[2], T[3], T[4], 1, $4, $5, $1, $2, $3 > XFILE
}
' XFILE="xfile" SFILE="sfile" - /tmp/16.txt
diff xfile /tmp/16.xx01 # no output from diff -> no difference!
Last edited by RudiC; 01-26-2017 at 04:30 PM..
Reason: typo
Hello,
For several of our scripts we are using awk to search patterns in files with data from other files. This works almost perfectly except that it takes ages to run on larger files. I am wondering if there is a way to speed up this process or have something else that is quicker with the... (15 Replies)
I have "inherited" a OmniOS (illumos based) server.
I noticed rsync is significantly slower in respect to my reference, FreeBSD 12-CURRENT, running on exactly same hardware.
Using same hardware, same command with same source and target disks, OmniOS r151026 gives:
test@omniosce:~# time... (11 Replies)
Hi,
I have a HP-UX server were I need to list all the files in the entire file system, their directory path, last modified date, owner and group. I do not need to search the file contents. I created the script given below and I am excluding directories and files of type tmp, temp and log. The... (4 Replies)
Hello, I've been searching for something that slows down a process for some time now. Slow down as in make time pass by slower. I have rarely turned to asking a forum in the past but at this point I've given up.
For example: if I made a program that would print "Hello" in 5 seconds, I would use... (18 Replies)
Hello everyone,
suppose there is a very big text file (>800 mb) that each line contains an article from wikipedia. Each article begins with a tag (<..>) containing its url. Currently there are 10^6 articles in the file.
I want to take random N articles, eliminate all non-alpharithmetic... (14 Replies)
HI Experts ,
I'm using the following code to remove spaces appearing at the end of the file.
sed "s/*$//g" <filename> > <new_filename>
mv <new_filename> <filename>
this is working fine for volumes upto 20-25 GB.
for the bigger files it is taking more time that it is required... (5 Replies)
We have an egrep search in a while loop.
egrep -w "$key" ${PICKUP_DIR}/new_update >> ${PICKUP_DIR}/update_record_new
${PICKUP_DIR}/new_update is 210 MB file
In each iteration, the egrep on an average takes around 50-60 seconds to search. Ther'es nothing significant in the loop other... (7 Replies)
Hi,
I am trying to send oracle archives over WAN and it is taking hell a lot of time. To reduce the time, I tried to gzip the files and send over to the other side. That seems to reduce the time. Does anybody have experienced this kind of problem and any possible ways to reduce the time.
... (1 Reply)
Hi,
I have one file stat.
Stat file contents are as follows: for example.
H50768020040913,00260100,507680,13,0000000643,0000000643,00000,0000
H50769520040808,00260100,507695,13,0000000000,0000000000,00000,0000 H50770620040611,00260100,507706,13,0000000000,0000000000,00000,0000
Now i... (1 Reply)
looking for solution to replicate 1.5GB files to a remote location...
Currently, this process looks like the following:
move 1.5GB files into a staging area.
compress files.
rsync files to remote server.
remove compressed files.
I have performed some timings, and compress seems more... (5 Replies)