Improve script - slow process with big files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Improve script - slow process with big files
# 8  
Old 01-26-2017
Dear RudiC,

Thansk a lot for this great job.

I think i have something missing because when i use the code. i got the following output .. for sfile.
Code:
H265678901234567890123456789012345678901234567890123456789012345678901234567890
H26      1         2         3         4         5         6         7          
S      0.00      0.00  11                           0.0       0.0   0.0000000004
S      0.00      0.00  11                           0.0       0.0   0.0000000150
S      0.00      0.00  11                           0.0       0.0   0.0000000296
S      0.00      0.00  11                           0.0       0.0   0.0000000442
S      0.00      0.00  11                           0.0       0.0   0.0000000588
S      0.00      0.00  11                           0.0       0.0   0.0000000734
S      0.00      0.00  11                           0.0       0.0   0.0000000880
S      0.00      0.00  11                           0.0       0.0   0.0000001026
S      0.00      0.00  11                           0.0       0.0   0.0000001172
S      0.00      0.00  11                           0.0       0.0   0.0000001318

Then i dont have the data for column 2 and others.

Please can u send me the output you got.

Thanks and regards

Last edited by jiam912; 01-26-2017 at 11:09 AM..
# 9  
Old 01-26-2017
This is what I get for SFILE:
Code:
H26 5678901234567890123456789012345678901234567890123456789012345678901234567890
H26      1         2         3         4         5         6         7          
S  67609.00  30835.00  11                      240038.1 2786615.9 373.82147483647
S  67609.00  30841.00  11                      240113.1 2786612.8 373.72147483647
S  67607.00  30841.00  11                      240111.7 2786588.4 373.92147483647
S  67605.00  30841.00  11                      240111.1 2786562.3 374.32147483647
S  67603.00  30841.00  11                      240116.1 2786537.1 374.42147483647
S  67609.00  30851.00  11                      240237.3 2786613.9 373.32147483647
S  67609.00  30491.00  11                      235736.9 2786612.1 368.72147483647
S  67607.00  30491.00  11                      235734.3 2786587.1 369.32147483647
S  67605.00  30491.00  11                      235737.1 2786561.2 368.72147483647
S  67603.00  30491.00  11                      235738.4 2786539.5 367.92147483647

Except for the last column which is the difficult date/time info, it is identical to your sample output. Did you test with your sample file from post#1?
This User Gave Thanks to RudiC For This Post:
# 10  
Old 01-26-2017
Dear RudiC,

Yes I use the same sample file,, but really i dont understand where the issue is.. I convert it to unix also to try but does not work.
# 11  
Old 01-26-2017
After some cogitating about GPS -> UTC date/time conversion, I could replicate the date/time column in your S file using GNU date 8.25 (although I still don't understand what you're after here). Both output files now are identical to the ones you attached in post#1. Try:
Code:
awk -F: '
BEGIN                   {FMT = "date +\"%d %d %d %d %11.1f %11.1f %11.1f 0%%d%%H%%M%%S %010d %010d\" -d@%s\n"
                         for (n = split ("Tape_Nb:File_Nb:Line_Name:Point_Number:Cog_Easting:Cog_Northing:Cog_Elevation:Tb_GPS_Time", IX); n>0; n--) SRCH[IX[n]]
                        }

$1 ~ /^Observer_Report/ {if (flag)      printf FMT,     OUT[IX[1]], OUT[IX[2]], OUT[IX[3]], OUT[IX[4]],
                                                        OUT[IX[5]], OUT[IX[6]], OUT[IX[7]], from, to, OUT[IX[8]] + 315961200 + 10783    # epoch = GPS + 6.1.1980 + 3h - 17 sec
                         delete OUT
                         from = NR
                         flag = 1
                        }

                        {gsub (/[       ]/, _)
                         to = NR
                        }

$1 in SRCH              {OUT[$1] = $2
                        }
$1 ~ SRCH[IX[8]]        {OUT[$1] = substr($2,1,10)
                        }

END                     {printf FMT,    OUT[IX[1]], OUT[IX[2]], OUT[IX[3]], OUT[IX[4]],
                                        OUT[IX[5]], OUT[IX[6]], OUT[IX[7]], from, to, OUT[IX[8]] + 315961200 + 10783                    # epoch = GPS + 6.1.1980 + 3h - 17 sec
                        }
' /tmp/16.txt |

sh |

awk -F[:-\(] '
BEGIN                   {HD1 = "H26 5678901234567890123456789012345678901234567890123456789012345678901234567890"
                         HD2 = "H26      1         2         3         4         5         6         7          "
                        }
NR == 1                 {print HD1 RS HD2 > XFILE
                         print HD1 RS HD2 > SFILE
                         }

FNR == NR               {OR[NR] = $0
                         MX = NR
                         next
                        }
FNR > NXTREP ||
FNR == 1                {n = split (OR[++OCNT], T, " ")
                         NXTREP = T[n] + 0
                         printf "S%10.2f%10.2f%3d1                     %9.1f%10.1f%6.1f%09d\n", T[3], T[4], 1, T[5], T[6], T[7], T[8] > SFILE
                        }

                        {sub (/^[       ]*/, _)
                         sub (/ *: */, ":")
                        }


$1 ~ /^Live_Seis/       {DATA = 1
                         sub (/Live_Seis[^:]*:/, _)
                        }
/[^0-9:() -]/           {DATA = 0
                        }
DATA                    {printf "X%6d%8d11%10.2f%10.2f%1d%5d%5d1%10.2f%10.2f%10.2f1\n", T[1], T[2], T[3], T[4], 1, $4, $5, $1, $2, $3 > XFILE 
                        }
' XFILE="xfile" SFILE="sfile" - /tmp/16.txt

diff xfile /tmp/16.xx01    # no diff = identical! 
diff sfile /tmp/16.ss01    # no diff = identical!


Last edited by RudiC; 01-26-2017 at 04:33 PM..
This User Gave Thanks to RudiC For This Post:
# 12  
Old 01-27-2017
Dear RudiC,

Thanks a lot for your help, It works perfectly now..

I have modified a little the code to get correct value in indexpoint,
Code:
OUT[IX[8]]

..

and i have to remove the tab spaces to let the code works fine.

here the last modification:

Code:
            read -p " " jd 

sed -i -e "s/[[:space:]]\+/ /g" $jd.txt 

awk -F: '
BEGIN                   {FMT = "date +\"%d %d %d %d %11.1f %11.1f %11.1f %d 0%%d%%H%%M%%S %010d %010d\" -d@%s\n"
                         for (n = split ("Tape_Nb:File_Nb:Line_Name:Point_Number:Cog_Easting:Cog_Northing:Cog_Elevation:Point_Index:Tb_GPS_Time", IX); n>0; n--) SRCH[IX[n]]
                        }

$1 ~ /^Observer_Report/ {if (flag)      printf FMT,     OUT[IX[1]], OUT[IX[2]], OUT[IX[3]], OUT[IX[4]],
                                                        OUT[IX[5]], OUT[IX[6]], OUT[IX[7]], OUT[IX[8]], from, to, OUT[IX[9]] + 315961200 + 10783    # epoch = GPS + 6.1.1980 + 3h - 17 sec
                         delete OUT
                         from = NR
                         flag = 1
                        }

                        {gsub (/[       ]/, _)
                         to = NR
                        }

$1 in SRCH              {OUT[$1] = $2
                        }
$1 ~ SRCH[IX[9]]        {OUT[$1] = substr($2,1,10)
                        }

END                     {printf FMT,    OUT[IX[1]], OUT[IX[2]], OUT[IX[3]], OUT[IX[4]],
                                        OUT[IX[5]], OUT[IX[6]], OUT[IX[7]], OUT[IX[8]], from, to, OUT[IX[9]] + 315961200 + 10783                    # epoch = GPS + 6.1.1980 + 3h - 17 sec
                        }
' $jd.txt |

sh |

awk -F[:-\(] '
BEGIN                   {HD1 = "H26 5678901234567890123456789012345678901234567890123456789012345678901234567890"
                         HD2 = "H26      1         2         3         4         5         6         7          "
                        }
NR == 1                 {print HD1 RS HD2 > XFILE
                         print HD1 RS HD2 > SFILE
                         }

FNR == NR               {OR[NR] = $0
                         MX = NR
                         next
                        }
FNR > NXTREP ||
FNR == 1                {n = split (OR[++OCNT], T, " ")
                         NXTREP = T[n] + 0
                         printf "S%10.2f%10.2f%3d1                     %9.1f%10.1f%6.1f%09d\n", T[3], T[4], T[8], T[5], T[6], T[7], T[9] > SFILE
                        }

                        {sub (/^[       ]*/, _)
                         sub (/ *: */, ":")
                        }


$1 ~ /^Live_Seis/       {DATA = 1
                         sub (/Live_Seis[^:]*:/, _)
                        }
/[^0-9:() -]/           {DATA = 0
                        }
DATA                    {printf "X%6d%8d11%10.2f%10.2f%1d%5d%5d1%10.2f%10.2f%10.2f1\n", T[1], T[2], T[3], T[4], T[8], $4, $5, $1, $2, $3 > XFILE 
                        }
' XFILE="$jd.x" SFILE="$jd.s" - $jd.txt

Appreciate your help
# 13  
Old 01-27-2017
The sed is not necessary, {gsub (/[ ]/, _) contained space and <TAB> and should remove all those. Mayhap got lost in transfer.
Why do you use the GPS date/time stamp and its (OK, not too) complicated transformation to UTC, if the clear text date/time is available in the "Date" record?
This User Gave Thanks to RudiC For This Post:
# 14  
Old 01-27-2017
Dear RudiC,
I will check why i have problems with the tab space.
I use the conversion GPS time to UTC to by more precise only.. and you are right the datetime is already in the file.. but this is the only reason why i use the GPStime...
Thanks s lot for your help...
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Bash script search, improve performance with large files

Hello, For several of our scripts we are using awk to search patterns in files with data from other files. This works almost perfectly except that it takes ages to run on larger files. I am wondering if there is a way to speed up this process or have something else that is quicker with the... (15 Replies)
Discussion started by: SDohmen
15 Replies

2. Solaris

Rsync quite slow (using very little cpu): how to improve its speed?

I have "inherited" a OmniOS (illumos based) server. I noticed rsync is significantly slower in respect to my reference, FreeBSD 12-CURRENT, running on exactly same hardware. Using same hardware, same command with same source and target disks, OmniOS r151026 gives: test@omniosce:~# time... (11 Replies)
Discussion started by: priyadarshan
11 Replies

3. HP-UX

Script execution is very slow when trying to find all files and their owners on HP-UX box

Hi, I have a HP-UX server were I need to list all the files in the entire file system, their directory path, last modified date, owner and group. I do not need to search the file contents. I created the script given below and I am excluding directories and files of type tmp, temp and log. The... (4 Replies)
Discussion started by: Adyan Faruqi
4 Replies

4. UNIX for Dummies Questions & Answers

How do I slow down a process?

Hello, I've been searching for something that slows down a process for some time now. Slow down as in make time pass by slower. I have rarely turned to asking a forum in the past but at this point I've given up. For example: if I made a program that would print "Hello" in 5 seconds, I would use... (18 Replies)
Discussion started by: Nathan1
18 Replies

5. Shell Programming and Scripting

Very big text file - Too slow!

Hello everyone, suppose there is a very big text file (>800 mb) that each line contains an article from wikipedia. Each article begins with a tag (<..>) containing its url. Currently there are 10^6 articles in the file. I want to take random N articles, eliminate all non-alpharithmetic... (14 Replies)
Discussion started by: fedonMan
14 Replies

6. UNIX for Advanced & Expert Users

sed working slow on big files

HI Experts , I'm using the following code to remove spaces appearing at the end of the file. sed "s/*$//g" <filename> > <new_filename> mv <new_filename> <filename> this is working fine for volumes upto 20-25 GB. for the bigger files it is taking more time that it is required... (5 Replies)
Discussion started by: sumoka
5 Replies

7. Shell Programming and Scripting

egrep is very slow : How to improve performance

We have an egrep search in a while loop. egrep -w "$key" ${PICKUP_DIR}/new_update >> ${PICKUP_DIR}/update_record_new ${PICKUP_DIR}/new_update is 210 MB file In each iteration, the egrep on an average takes around 50-60 seconds to search. Ther'es nothing significant in the loop other... (7 Replies)
Discussion started by: hidnana
7 Replies

8. AIX

How to send big files over slow network?

Hi, I am trying to send oracle archives over WAN and it is taking hell a lot of time. To reduce the time, I tried to gzip the files and send over to the other side. That seems to reduce the time. Does anybody have experienced this kind of problem and any possible ways to reduce the time. ... (1 Reply)
Discussion started by: giribt
1 Replies

9. Shell Programming and Scripting

bash script working for small size files but not for big size files.

Hi, I have one file stat. Stat file contents are as follows: for example. H50768020040913,00260100,507680,13,0000000643,0000000643,00000,0000 H50769520040808,00260100,507695,13,0000000000,0000000000,00000,0000 H50770620040611,00260100,507706,13,0000000000,0000000000,00000,0000 Now i... (1 Reply)
Discussion started by: davidpreml
1 Replies

10. UNIX for Advanced & Expert Users

looking for solution to improve process replicate files to remote loc.

looking for solution to replicate 1.5GB files to a remote location... Currently, this process looks like the following: move 1.5GB files into a staging area. compress files. rsync files to remote server. remove compressed files. I have performed some timings, and compress seems more... (5 Replies)
Discussion started by: mr_manny
5 Replies
Login or Register to Ask a Question