![]() |
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here. |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| SVN subversion performance issue. | email-lalit | Red Hat | 6 | 06-11-2008 06:48 PM |
| performance issue | vishwaraj | Shell Programming and Scripting | 1 | 03-03-2008 02:29 AM |
| performance issue | big123456 | UNIX for Advanced & Expert Users | 1 | 08-28-2007 09:53 AM |
| performance issue | rein | AIX | 1 | 07-12-2007 02:54 AM |
| Performance issue | shibz | UNIX for Advanced & Expert Users | 5 | 12-17-2002 11:12 AM |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
||||
|
performance issue using gzcat, awk and sort
hi all,
I was able to do a script to gather a few files and sort them. here it is: Code:
#!/usr/bin/ksh
ls *mainFile* |cut -c20-21 | sort > temp
set -A line_array
i=0
file_name='temp'
while read file_line
do
line_array[i]=${file_line}
let i=${i}+1
# mainFile
gzcat *mainFile-dsa${file_line}* | awk '
BEGIN { FS = "," } ;
{if($1="") {mykey=$1} else {mykey=prev}}
{if(mykey != prev)
{print mykey",1,"NR","$0; prev=mykey}
else
{print prev",1,"NR","$0; prev=mykey}}
' > final
# line
gzcat *line-dsa${file_line}* | awk '
BEGIN { FS = "," } ;
{if($1="") {mykey=$1} else {mykey=prev}}
{if(mykey != prev)
{print mykey",2,"NR","$0; prev=mykey}
else
{print prev",2,"NR","$0; prev=mykey}}
' >> final
# ss
gzcat *ss-dsa${file_line}* | awk '
BEGIN { FS = "," } ;
{print $1",3,"NR","$0;}
' >> final
#bsginfo
gzcat *bsginfo-dsa${file_line}* | awk '
BEGIN { FS = "," } ;
{print $1",4,"NR","$0;}
' >> final
#gprs
gzcat *gprs-dsa${file_line}* | awk '
BEGIN { FS = "," } ;
{if($1="") {mykey=$1} else {mykey=prev}}
{if(mykey != prev)
{print mykey",5,"NR","$0; prev=mykey}
else
{print prev",5,"NR","$0; prev=mykey}}
function isnum(n) { return n ~ /^[0-9]+$/ }
' >> final
#odbdata
gzcat *odbdata-dsa${file_line}* | awk '
BEGIN { FS = "," } ;
{print $1",6,"NR","$0;}
' >> final
ls *mainFile* |cut -c0-8 | sort | read data
#sort -t "," +0 -2 -n final > final2
sort -t ',' +0 -1n +1 -2n +2 -3n final > final2
#sort final > final2
rm final
rm temp
gzip final2
mv final2.gz ${data}-final-dsa${file_line}.csv.gz
done < ${file_name}
- when lines in each file exceeds a few millions "NR" instead of having the normal number, so i can apply sort, it gets in scientific notation and I'm not able to guarantee the lines order; - the server as a I/0 charge very big so i should be able to do all the process only in memory (there are processors without charge and memory). - can i receive the several gzcat input into only one awk script? or it is not possible? - can i use pipe to send the previous result to the next instruction without writing to the "final" file? - when it gets to sort instruction I/0 use goes from 30% to 100% and memory use stays the same, why? can someone help me out on any of this question? it is getting really hard for a newbie like me to get a solution my problems because a system that should take one day doing his operations is taking 5 days and i'm trying to get solutions in areas that i really don't understand for now. Best regards, Ricardo Tomás |
|
||||
|
Quote:
Code:
borkstation$ awk 'END { print 123456789123456 }' /dev/null
1.23457e+14
borkstation$ awk 'END { printf "%i\n", 123456789123456 }' /dev/null
2147483647
borkstation$ perl -le 'print 123456789123456'
123456789123456
Quote:
Quote:
Quote:
Code:
( awk one; awk too; awk some more ) | sort |
|
||||
|
I usually find printf "%.f\n",variablename does the trick in awk.
sort usually has a command-line option to change the amount of memory it will allocate... usually the default is quite small, so you may see some benefit by increasing it. You can also sometimes control where it will store temporary files, so you may be able to specify some faster disks, or some that do not contain the original data so that they are not competing with each other. See man sort for details... |
|
|||||
|
Quote:
Also, if your system starts swapping, you'll lose the memory advantage. Quote:
In that case, using an external sort would be better. Quote:
|
![]() |
| Bookmarks |
| Tags |
| awk, big line numbers, gzcat, integer size, multiple files, performance tuning, sort |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|