Alternative solution to nested loops in shell programming

Homework & Coursework Questions Alternative solution to nested loops in shell programming
Alternative solution to nested loops in shell programming

1. The problem statement, all variables and given/known data:


The problem statement is: I am trying to read line by line from a flat file by using a while loop. The flat file will contain 100k records and each record will have 25 columns. While reading each line, I have to read some values from an array and create a map of the values of the array and the fields extracted from each line. I tried using a for inside the while loop, but that is killing the performance. I would like to know any alternate approach to avoid the nested loops. Any help would be greatly appreciated.

2. Relevant commands, code, scripts, algorithms:

Command to run the script:

Create_Index.ksh <config_file> "ABC" 1

Indexfields_1 will contain the values separated by "," for which the mapping needs to be created.

E.g: "A","B","C", "D" ...... like that 25 fields


if [[ $# != 3 ]];then
        echo "Incorrect No .of aurguments sent to script"
        echo "Usage: Create_Index.ksh <config_file_name><table_identifier><segment_number> "
        echo "Insufficient parameters to continue execution. Exiting the $(basename ${0}) script with 1 at $(date)"
        exit 1


if [ -s ${config_file} ]
        . ${config_file}
        log "Config file not found"

# function to log message to log file
function log

        echo "== $(date '+%m/%d/%Y %H:%M:%S')  :${msg}" >>${IndexCreation_DAILY_LOG}

# function ends

IndexCreation_DAILY_LOG=${log_dir}/${log_filename_suffix}.$(date +%m%d%y_%H%M%S)
trigger_file_prefix=`basename ${metadata_file_name%.dat}`

if [[ ! -d "${log_dir}" ]];then
mkdir -p "${log_dir}"

#rm -rf ${index_dir}/*

if [[ ! -d "${index_dir}" ]];then
mkdir -p "${index_dir}"

if [[ ! -d "${afp_dir}" ]];then
mkdir -p "${afp_dir}"

log "**********************************************************************************"
log "********Script**started**at***$(date '+%m/%d/%Y %H:%M:%S')************************"
log "**********************************************************************************"

rm -rf ${index_dir}/*

if [ $? != 0 ]
log "Unable to delete the old index files. Indexing failed, so creating failed trigger"
> ${trigger_dir}/${trigger_file_prefix}.indexfailed
exit 1

log "Successfully deleted the old index files from the directory ${index_dir}"

declare -i i=1
declare -i outfilecount=0

#Fetches the index values for the identifier passed in the argument
grep $identifier Indexfields_1 > tempfile1
indexfieldsnumber=`awk 'BEGIN {FS=","} ; END{print NF}' tempfile1`
log "fields to be present in undex file are $indexfieldsnumber"
cat tempfile1

#Populates the fetched index values from previous step in an array.
declare -i j=1
declare -i k=0
while [[ $j -le $indexfieldsnumber ]] ; do
indexfieldname=`cut -d "," -f${j} tempfile1`

#Finished populating the index fields values for an identifier in the array.

declare -i outfilecount=0
declare -i numberoflinesread=0
declare -i linenumber=0 #debug purpose

while read line #read the metadata file

#record=$(echo "${record}" | tr -d '[[:space:]]')

declare -i mdfieldcount=0
declare -i arrayfieldnum=0

for fieldposition in "${array[@]}" #read the field name

      #  groupfieldvalue=`echo ${line} | cut -d , -f${mdfieldcount}`

        #echo "fieldposition is $fieldposition and value is $groupfieldvalue"

        if [[ ${fieldposition} != ${2} ]]
        groupfieldvalue=`echo ${line} | cut -d , -f${mdfieldcount}`
        groupfieldvalue=$(echo "${groupfieldvalue}" | tr -d '[[:space:]]')

#       if [[ $? != 0 ]]
#       then
#       log "unable to find the group field value for ${fieldposition}"
#       mv ${trigger_file_name} ${trigger_file_prefix}.failed
#       fi

                if [[ ${fieldposition} != "${DOCUMENT_NAME}" && ${fieldposition} != "${DOCUMENT_OFFSET}" && ${fieldposition} != "${DOCUMENT_LENGTH}" && ${fieldposition} != "${COMP_OFFSET}" && ${fieldposition} != "${COMP_LENGTH}" ]]
                        echo  "GROUP_FIELD_NAME:${fieldposition}" >> ${index_dir}/afp${i}.ind
                        echo  "GROUP_FIELD_VALUE:${groupfieldvalue}" >> ${index_dir}/afp${i}.ind

        if [[ ${fieldposition} == "${DOCUMENT_NAME}" ]]
        docname="$(echo "$docname" | tr -d ' ')"

        if [[ ${fieldposition} == "${DOCUMENT_OFFSET}" ]]

        if [[ ${fieldposition} == "${DOCUMENT_LENGTH}" ]]

        if [[ ${fieldposition} == "${COMP_LENGTH}" ]]

        if [[ ${fieldposition} == "${COMP_OFFSET}" ]]

        filename=$(echo "${filename}" | tr -d '[[:space:]]')
        indexfilename=$(echo "${indexfilename}" | tr -d '[[:space:]]')

        if [[ $previousfilename != $currentfilename ]]


        mdfieldcount=${mdfieldcount}+1 #Increment the metadata field count to fetch the next value from the metadt file


        echo "GROUP_OFFSET:${docoff}" >> ${index_dir}/afp${i}.ind
        echo "GROUP_LENGTH:${doclen}" >> ${index_dir}/afp${i}.ind
        echo "GROUP_FILENAME:${output_file_path}/${filename}" >> ${index_dir}/afp${i}.ind

        #debug purpose only

        if [[ $linenumber == 5000 ]]; then

        echo  "CODEPAGE:850" >> ${index_dir}/afp${i}.ind


        #debug purpose only

       echo "finished processing for $linenumber"

done < ${metadata_file_name}

log "removing the temp file containing the indexed fields"
rm -rf tempfile
rm -rf  ${index_dir}/afp*.ind

mv "${trigger_dir}/${trigger_file_prefix}.indexinprogress" "${trigger_dir}/${trigger_file_prefix}.indexed"

log "*************************************************************************************************"
log "********Script***completed**at***$(date '+%m/%d/%Y %H:%M:%S')*************************************"
log "*************************************************************************************************"

3. The attempts at a solution (include all code and scripts):


Utkal University, IND.
Utkal University, IND.

Please provide the information for #4 above - THANK YOU.

PS: you invoke ksh but seem to have some bash code in your example. It will not run.
The code runs, but the performance is slow. The for loop inside the while is causing the issue. It would be great if u can provide an alternate approach to avoid this nested loop.
True, the code is not ksh. I guess there is
$ ls -l /usr/bin/ksh
... -> /bin/bash

#4 the School/University is still missing!
Some data samples might help. Wouldn't a performance / time profile make sense?
It doesn't look like it's the loop that's the problem, to me. It's the creation of all those tiny files, and all those external tr -d calls, and the >> re-opening the same file over and over and over.
Hi Corona688/Rudi C,

As per your suggestion, I have changed the tr -d with the "sed" to remove spaces. But still I am seeing the same performance. Could you please suggest some alternate solution to this problem?

Featured Tech Videos