KSH SHELL: problem calculation number of lines inside compressed file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting KSH SHELL: problem calculation number of lines inside compressed file
# 1  
Old 01-27-2011
KSH SHELL: problem calculation number of lines inside compressed file

Hi Gurus,
I am working with a korn shell script to simplify some operations of calculation number of lines inside compressed file.
The called function (inside a cycle) is the following:

Code:
 
#########################################
# F.ne: CheckCount
#########################################
function CheckCount
{
 
CDRHDTR=2
CDRI=0
for xx in ${DIRLIST}
do
 
        cd ${OUTDIR_MOD1}${xx}
        LISTO=`cat ${dir_log}LIST_1_$RepDate.tmp`
        for fileo in $LISTO
        do
        if [[ ! -f "${fileo}" ]]
        then
        CDRO_TOT=0
        file_upd=`echo $fileo | cut -d"*" -f1`
        UpdateOut $CDRO_TOT ${file_upd}"%"
        else
        CDRO=`gzcat ${fileo} | wc -l | awk '{print $1}'`
        CDRO_TOT=`echo "$CDRO - $CDRHDTR" | bc`
        file_upd=`echo $fileo | cut -d"*" -f1`
        UpdateOut $CDRO_TOT ${file_upd}
        fi
        done
done

Unfortunately the performances are very bad, and in some cases straight
it performs a wrong calculation!!!

an idea to optimize the job and to make sure in the calculation?

Thanks,
Germano
# 2  
Old 01-27-2011
Quote:
for xx in ${DIRLIST}
do

cd ${OUTDIR_MOD1}${xx}
LISTO=`cat ${dir_log}LIST_1_$RepDate.tmp`
for fileo in $LISTO
Later in the script there is code that suggests that $fileo can contain asterisk characters. Very important that we look at what gets put into this variable.

What is in ${DIRLIST} ?
What is in ${OUTDIR_MOD1} ?
What is in ${dir_log} ?
What is in the file LIST_1_$RepDate.tmp ?

How many directories, hoe many files, how big?
# 3  
Old 01-27-2011
I recover the parameters from a file of configuration in this way:

Code:
##### Constants
# recover properties from file
. ../cfg/init.cfg

these are the values:

Code:
 
DIRLIST="20101227 20101228 20101229"
OUTDIR_MOD1="/data/remote/filter/output/"
dir_log="="/data/remote/filter/log/"

works on 44 directory and round 660000 file(compressed) what they contain 1500 rows each.
LIST_1_$RepDate.tmp (LIST_1_20110126_102312.tmp) it contains the list of the files
to elaborate, this is a part of the content:

Code:
 
NC_AN01MSC_GSM_20101227_5493*
NC_AN01MSC_GSM_20101227_5507*
NC_AN01MSC_GSM_20101227_5508*

the character "*" to end line he has on purpose put, because it serves in this format
for an following operation sql:
Code:
#########################################
# F.ne: Updateout
#########################################
UpdateOut()
{
isql -U$USER_DB -P$PWD_OPER -S$SERVER -D$DB_OPER w1000 << THEEND >> ${LogFILENAME}
set nocount on
go
UPDATE $DB_OPER..KPI_FILTERED set NUM_CDR_OUTPUT=${1}
WHERE FILE_NAME LIKE '${2}'
go
quit
THEEND
}

all this correctly works...the variables arrive cleaning up, the problem I think both
this operation:

Code:
CDRO_TOT=`echo "$CDRO - $CDRHDTR" | bc`

do you know a faster and effective method(no expr)?

I in advance thank you,
Germano

Last edited by Scott; 01-27-2011 at 01:05 PM.. Reason: Added another code tag
# 4  
Old 01-27-2011
What Operating System and version do you have? Many would not process this script with 660,000 filenames on the command line. Or are you testing with a small number of directories and a small number of files.
Unclear whether each of the 44 directories contains 660,000 files with their names listed in the files list file ... or whether there are just 660,000 files spread across 44 directories.

There appear to be much bigger problems with this script than the speed of arithmetic in "bc".

We must understand whether the names of the actual files on disc have this trailing asterisk character. The way the script is written makes me think that they do.
What does the directory list for say these three files look like?
NC_AN01MSC_GSM_20101227_5493*
NC_AN01MSC_GSM_20101227_5507*
NC_AN01MSC_GSM_20101227_5508*

Also is the number at the end of the file name variable length?
For example are all these names valid?
NC_AN01MSC_GSM_20101227_5*
NC_AN01MSC_GSM_20101227_54*
NC_AN01MSC_GSM_20101227_549*
NC_AN01MSC_GSM_20101227_5493*



The main design issue is that "isql" is loaded for every file listed in the files list file. This will be very slow.

Once we get it clear how many files there are and what their names are, we can look at whether this job is feasible in Shell.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

File count lines in a compressed file

count lines in a compressed file ( Unix) My Zip file having multiple files, without ever writing the (decompressed) file to disk., how i can check the line counts for each of the file I tried using zcat <*.zip> | wc -l , this is reading only the first file and ignoring other files in the Zip... (3 Replies)
Discussion started by: kartikirans
3 Replies

2. Shell Programming and Scripting

Return value inside isql to a shell variable in ksh

Hello, I have a shell script where I am doing an isql to select some records. the result i get from the select statement is directed to an output file. I want to assign the result to a Shell variable so that I can use the retrieved in another routine. e.g. "isql -U${USER} -P${PASSWD} -S${SERVER}... (1 Reply)
Discussion started by: RookieDev
1 Replies

3. Shell Programming and Scripting

Decimal number calculation problem

I have a below snippet of code from my perl script and its causing a problem when the output of $lTAX is 0 (zero) its getting displayed as -0.00. I want output to be 0 not -0.00. Any help would be appreciated. #!/usr/bin/perl my $lTotA = 50.94; my $lVatA = 8.49; my $lAllocD; my $lAdjNr =... (4 Replies)
Discussion started by: Devesh5683
4 Replies

4. Shell Programming and Scripting

find biggest number inside file

Hi, I wanna find the biggest number inside of a file this is kind of example of file: 9 11 55 then i just wanna print out the biggest number i had try sed filenale | sort -k1,1n | paste -s -d',' - but i had no success ... (7 Replies)
Discussion started by: prpkrk
7 Replies

5. Shell Programming and Scripting

How to change a number on a specific lines in a file with shell?

Hello My problem is that I want to change some specific numbers in a file. It is like, 2009 10 3 2349 21.3 L 40.719 27.388 10.8 FRO 7 0.8 1.1LFRO 2.6CFRO 1.1LMAM1 GAP=157 1.69 5.7 5.9 5.8 0.5405E+01 0.4455E+00 0.1653E+02E STAT SP IPHASW D HRMM SECON CODA AMPLIT... (11 Replies)
Discussion started by: miriammiriam
11 Replies

6. Shell Programming and Scripting

How to insert a sequence number column inside a pipe delimited csv file using shell scripting?

Hi All, I need a shell script which could insert a sequence number column inside a dat file(pipe delimited). I have the dat file similar to the one as shown below.. |A|B|C||D|E |F|G|H||I|J |K|L|M||N|O |P|Q|R||S|T As shown above, the column 4 is currently blank and i need to insert sequence... (5 Replies)
Discussion started by: nithins007
5 Replies

7. Shell Programming and Scripting

Solaris KSH shell script to copy all lines from one file to another

Hello, more of a windows wscript guy. However I took a new position that requires me to support some solaris servers. So... issue is that I need to copy all lines from a file to a temporary file and then copy them back into the original file starting at line 1. Reason I need to do this is... (5 Replies)
Discussion started by: ZigZaggin
5 Replies

8. UNIX for Advanced & Expert Users

how to grep/read a file inside compressed tgz without extract?

Hi all, I would like to ask whether in Unix shell/perl have any functions or command to allow grep/cat/read a file inside compressed .tgz without extract it? I know we can tar tvf a compressed tgz but this only allow we read the path/filename contained inside the tarball. If we want to read... (3 Replies)
Discussion started by: mayshy
3 Replies

9. Shell Programming and Scripting

Ksh Solaris Time calculation problem..Please help

I've gone through bunch of threads on time calculations but none of them helps on my problem I've to get the time difference in HHMM format from following inputs Input 1 : 01/08/2010 01:30 01/08/2010 03:20 Input 2 : 01/06/2010 22:00 01/07/2010 16:00 First input is easy but... (8 Replies)
Discussion started by: prash184u
8 Replies

10. Shell Programming and Scripting

replacing a number with random variable inside shell script

Hi All. I need help for the below logic. I ve a file like following input file: NopTX(5) // should be remain same as input ----Nop(@100); //1 Nop(90); //2 --Nop(80); //3 @Nop(70); //4 --@Nop(60); //5 @Nop@(@50); //6 --Nop@( 40); ... (3 Replies)
Discussion started by: user_prady
3 Replies
Login or Register to Ask a Question