performance issue using gzcat, awk and sort Post: 302239940

Sponsored Content

Top Forums Shell Programming and Scripting performance issue using gzcat, awk and sort Post 302239940 by naoseionome on Wednesday 24th of September 2008 07:30:13 PM

09-24-2008

Registered User

performance issue using gzcat, awk and sort

hi all,
I was able to do a script to gather a few files and sort them.

here it is:

Code:

#!/usr/bin/ksh


ls *mainFile* |cut -c20-21 | sort > temp

set -A line_array
i=0
file_name='temp'

while read file_line
do
 line_array[i]=${file_line}
 let i=${i}+1
  


# mainFile
gzcat *mainFile-dsa${file_line}* | awk '   
BEGIN { FS = "," } ; 
{if($1="") {mykey=$1} else {mykey=prev}}
{if(mykey != prev) 
    {print mykey",1,"NR","$0; prev=mykey} 
else 
    {print prev",1,"NR","$0; prev=mykey}}
' > final
# line
gzcat *line-dsa${file_line}* | awk '   
BEGIN { FS = "," } ; 
{if($1="") {mykey=$1} else {mykey=prev}}
{if(mykey != prev) 
    {print mykey",2,"NR","$0; prev=mykey} 
else 
    {print prev",2,"NR","$0; prev=mykey}}
' >> final
# ss
gzcat *ss-dsa${file_line}* | awk '   
BEGIN { FS = "," } ; 
    {print $1",3,"NR","$0;} 
' >> final
#bsginfo
gzcat *bsginfo-dsa${file_line}* | awk '   
BEGIN { FS = "," } ; 
    {print $1",4,"NR","$0;} 
' >> final
#gprs
gzcat *gprs-dsa${file_line}* | awk '   
BEGIN { FS = "," } ; 
{if($1="") {mykey=$1} else {mykey=prev}}
{if(mykey != prev) 
    {print mykey",5,"NR","$0; prev=mykey} 
else 
    {print prev",5,"NR","$0; prev=mykey}}
function isnum(n) { return n ~ /^[0-9]+$/ }
' >> final
#odbdata
gzcat *odbdata-dsa${file_line}* | awk '   
BEGIN { FS = "," } ; 
    {print $1",6,"NR","$0;} 
' >> final

ls *mainFile* |cut -c0-8 | sort | read data

#sort -t "," +0 -2 -n final > final2
sort  -t ',' +0 -1n +1 -2n +2 -3n  final > final2 
#sort final > final2
rm  final
rm  temp
gzip final2
mv final2.gz ${data}-final-dsa${file_line}.csv.gz


done < ${file_name}

my problems:
- when lines in each file exceeds a few millions "NR" instead of having the normal number, so i can apply sort, it gets in scientific notation and I'm not able to guarantee the lines order;
- the server as a I/0 charge very big so i should be able to do all the process only in memory (there are processors without charge and memory).
- can i receive the several gzcat input into only one awk script? or it is not possible?
- can i use pipe to send the previous result to the next instruction without writing to the "final" file?
- when it gets to sort instruction I/0 use goes from 30% to 100% and memory use stays the same, why?

can someone help me out on any of this question?
it is getting really hard for a newbie like me to get a solution my problems because a system that should take one day doing his operations is taking 5 days and i'm trying to get solutions in areas that i really don't understand for now.

Best regards,
Ricardo Tom�s

naoseionome

View Public Profile for naoseionome

Find all posts by naoseionome

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Performance issue

Hello all, I just stuck up in an uncertain situation related to network performance... I am trying to access one of my remote client unix machine from a distant location.. The client machine is Ultra-5_10 , with SunOS 5.5.1 The ndd result ( hme1 )shows that the machine is hooked to a...

2. AIX

performance issue

We have a AIX v5.3 on a p5 system with a poor performing Ingres database. We added one CPU to the system to see if this would help. Now there are two CPU's. with sar and topas -P I see good results: CPU usage around 30% with topas I only see good results in the process output screen, the...

3. UNIX for Advanced & Expert Users

performance issue

Hi, on a linux server I have the following : vmstat 2 10 procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 0 4 0 675236 39836 206060 1617660 3 3 3 6 8 7 1 1 ...

4. Shell Programming and Scripting

gzcat into awk and then change FILENAME and process new FILENAME

I am trying to write a script that prompts users for date and time, then process the gzip file into awk. During the ksh part of the script another file is created and needs to be processed with a different set of pattern matches then I need to combine the two in the end. I'm stuck at the part...

5. Shell Programming and Scripting

Performance issue with awk script.

Hi, The below awk script is taking about 1 hour to fetch just 11 records(columns). There are about 48000 records. The script file name is take_first_uniq.sh #!/bin/ksh if then while read line do first=`echo $line | awk -F"|" '{print $1$2$3}'` while read line2 do...

6. Solaris

Performance issue

Hi Gurus, I am beginner in solaris and want to know what are the things we need to check for performance monitoring on our solairs OS. for DISK,CPU and MEMORY. Also how we do ipforwarding in slaris Many thanks for your help Pradeep P

7. UNIX for Dummies Questions & Answers

Performance issue

hi I am having a performance issue with the following requirement i have to create a permutation and combination on a set of three files such that each record in each file is picked and the output is redirected in a specific format but it is taking around 70 odd hours to prepare a combination...

8. Shell Programming and Scripting

awk performance issue

Hi, I have the code below as cat <filename> | tr '~' '\n' | sed '/^$/ d' | sed "s/*/|/g" > <filename> awk -F\| -vt=`date +%m%d%y%H%M%S%s` '$1=="ST",$1=="SE"{if($1=="ST"){close(f);f="214_edifile_"t"" ++i} ; $1=$1; print>f}' OFS=\| <filename> This script replaces some characters and...

9. UNIX for Dummies Questions & Answers

awk script performance issue

Hello All, I have the below excerpt of code in my shell script and it taking long time to complete, though it prints the output quickly. Is there a way to make it come out once it finds the first instance as the file size of 4.7 GB it could be going through all lines of the data file to find for...

10. UNIX for Dummies Questions & Answers

File sort performance

Hi, I have got a 9.3GB file and it is taking 1h 8min to sort file using the following code: sort -T /directory1 -t | -k9,9 -k8,8n /directory1/file1 > /directory2/file2 Is there a faster way of doing it please? Thanks Shash

LEARN ABOUT MINIX

wrap-and-sort

WRAP-AND-SORT(1)					      General Commands Manual						  WRAP-AND-SORT(1)

NAME

       wrap-and-sort - wrap long lines and sort items in Debian packaging files

SYNOPSIS

       wrap-and-sort [options]

DESCRIPTION

       wrap-and-sort  wraps the package lists in Debian control files. By default the lists will only split into multiple lines if the entries are
       longer than 80 characters. wrap-and-sort sorts the package lists in Debian control files and all .install files. Beside that  wrap-and-sort
       removes trailing spaces in these files.

       This  script should be run in the root of a Debian package tree. It searches for control, control.in, copyright, copyright.in, install, and
       *.install in the debian directory.

OPTIONS

       -h, --help
	      Show this help message and exit.

       -a, --wrap-always
	      Wrap all package lists in the Debian control file even if the entries are shorter than 80 characters and could fit in one line line.

       -s, --short-indent
	      Only indent wrapped lines by one space (default is in-line with the field name).

       -b, --sort-binary-packages
	      Sort binary package paragraphs by name.

       -k, --keep-first
	      When sorting binary package paragraphs, leave the first one at the top.  Unqualified debhelper(7) configuration files are applied to
	      the first package.

       -n, --no-cleanup
	      Do not remove trailing whitespaces.

       -d path, --debian-directory=path
	      Location of the debian directory (default: ./debian).

       -f file, --file=file
	      Wrap  and sort only the specified file.  You can specify this parameter multiple times.  All supported files will be processed if no
	      files are specified.

       -v, --verbose
	      Print all files that are touched.

AUTHORS

       wrap-and-sort and this manpage have been written by Benjamin Drung <bdrung@debian.org>.

       Both are released under the ISC license.

DEBIAN
								 Debian Utilities						  WRAP-AND-SORT(1)

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Performance issue

Discussion started by: shibz

2. AIX

performance issue

Discussion started by: rein

3. UNIX for Advanced & Expert Users

performance issue

Discussion started by: big123456

4. Shell Programming and Scripting

gzcat into awk and then change FILENAME and process new FILENAME

Discussion started by: timj123

5. Shell Programming and Scripting

Performance issue with awk script.

Discussion started by: RRVARMA

6. Solaris

Performance issue

Discussion started by: ppandey21

7. UNIX for Dummies Questions & Answers

Performance issue

Discussion started by: mad_man12

8. Shell Programming and Scripting

awk performance issue

Discussion started by: atlantis_yy

9. UNIX for Dummies Questions & Answers

awk script performance issue

Discussion started by: Ariean

10. UNIX for Dummies Questions & Answers

File sort performance

Discussion started by: shash

LEARN ABOUT MINIX

wrap-and-sort