Unix/Linux Go Back    


Homework & Coursework Questions Students must use and complete the template provided. If you don't, your post may be deleted! Special homework rules apply here.

Alternative solution to nested loops in shell programming

Homework & Coursework Questions


Closed    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 09-21-2015
Sandeep Pattnai Sandeep Pattnai is offline
Registered User
 
Join Date: Sep 2015
Last Activity: 21 September 2015, 8:00 PM EDT
Posts: 3
Thanks: 3
Thanked 0 Times in 0 Posts
Alternative solution to nested loops in shell programming

Use and complete the template provided. The entire template must be completed. If you don't, your post may be deleted!

1. The problem statement, all variables and given/known data:

Hi,

The problem statement is: I am trying to read line by line from a flat file by using a while loop. The flat file will contain 100k records and each record will have 25 columns. While reading each line, I have to read some values from an array and create a map of the values of the array and the fields extracted from each line. I tried using a for inside the while loop, but that is killing the performance. I would like to know any alternate approach to avoid the nested loops. Any help would be greatly appreciated.


2. Relevant commands, code, scripts, algorithms:

Command to run the script:

Create_Index.ksh <config_file> "ABC" 1

Indexfields_1 will contain the values separated by "," for which the mapping needs to be created.

E.g: "A","B","C", "D" ...... like that 25 fields


Code:
#!/usr/bin/ksh

if [[ $# != 3 ]];then
        echo "Incorrect No .of aurguments sent to script"
        echo "Usage: Create_Index.ksh <config_file_name><table_identifier><segment_number> "
        echo "Insufficient parameters to continue execution. Exiting the $(basename ${0}) script with 1 at $(date)"
        exit 1
fi


config_file=${1}

if [ -s ${config_file} ]
then
        . ${config_file}
else
        log "Config file not found"
fi


#-------------------------------------
# function to log message to log file
#-------------------------------------
function log
{
        msg="$1"

        echo "== $(date '+%m/%d/%Y %H:%M:%S')  :${msg}" >>${IndexCreation_DAILY_LOG}
}

#-------------------------------------
# function ends
#------------------------------------


base_dir="${BASE_DIR}"
afp_dir="${AFP_DIR}"
index_dir="${INDEX_DIR}/$2/$2$3"
log_dir="${LOG_DIR}/$2/$2$3"
trigger_dir="${TRIGGER_DIR}/$2/$2$3"
log_filename_suffix="${LOG_FILENAME_SUFFIX}"
output_file_path="${OUTPUT_FILE_PATH}/$2$3"
IndexCreation_DAILY_LOG=${log_dir}/${log_filename_suffix}.$(date +%m%d%y_%H%M%S)
metadata_file_name="${METADATA_FILENAME}"
trigger_file_prefix=`basename ${metadata_file_name%.dat}`
trigger_file_name="${trigger_file_prefix}.indexing"


if [[ ! -d "${log_dir}" ]];then
mkdir -p "${log_dir}"
fi

#rm -rf ${index_dir}/*

if [[ ! -d "${index_dir}" ]];then
mkdir -p "${index_dir}"
fi

if [[ ! -d "${afp_dir}" ]];then
mkdir -p "${afp_dir}"
fi

log "**********************************************************************************"
log "********Script**started**at***$(date '+%m/%d/%Y %H:%M:%S')************************"
log "**********************************************************************************"

rm -rf ${index_dir}/*

if [ $? != 0 ]
then
log "Unable to delete the old index files. Indexing failed, so creating failed trigger"
> ${trigger_dir}/${trigger_file_prefix}.indexfailed
exit 1

else
log "Successfully deleted the old index files from the directory ${index_dir}"
fi

identifier=$2
declare -i i=1
declare -i outfilecount=0

#Fetches the index values for the identifier passed in the argument
grep $identifier Indexfields_1 > tempfile1
indexfieldsnumber=`awk 'BEGIN {FS=","} ; END{print NF}' tempfile1`
log "fields to be present in undex file are $indexfieldsnumber"
cat tempfile1


#Populates the fetched index values from previous step in an array.
declare -i j=1
declare -i k=0
while [[ $j -le $indexfieldsnumber ]] ; do
indexfieldname=`cut -d "," -f${j} tempfile1`
array[${k}]="$indexfieldname"
j=$j+1
k=$k+1

done
#Finished populating the index fields values for an identifier in the array.

declare -i outfilecount=0
declare -i numberoflinesread=0
declare -i linenumber=0 #debug purpose

while read line #read the metadata file
do

record="$line"
#record=$(echo "${record}" | tr -d '[[:space:]]')

declare -i mdfieldcount=0
declare -i arrayfieldnum=0

for fieldposition in "${array[@]}" #read the field name
        do

      #  groupfieldvalue=`echo ${line} | cut -d , -f${mdfieldcount}`

        #echo "fieldposition is $fieldposition and value is $groupfieldvalue"


        if [[ ${fieldposition} != ${2} ]]
        then
        groupfieldvalue=`echo ${line} | cut -d , -f${mdfieldcount}`
        groupfieldvalue=$(echo "${groupfieldvalue}" | tr -d '[[:space:]]')

#       if [[ $? != 0 ]]
#       then
#       log "unable to find the group field value for ${fieldposition}"
#       mv ${trigger_file_name} ${trigger_file_prefix}.failed
#       fi

                if [[ ${fieldposition} != "${DOCUMENT_NAME}" && ${fieldposition} != "${DOCUMENT_OFFSET}" && ${fieldposition} != "${DOCUMENT_LENGTH}" && ${fieldposition} != "${COMP_OFFSET}" && ${fieldposition} != "${COMP_LENGTH}" ]]
                then
                        echo  "GROUP_FIELD_NAME:${fieldposition}" >> ${index_dir}/afp${i}.ind
                        echo  "GROUP_FIELD_VALUE:${groupfieldvalue}" >> ${index_dir}/afp${i}.ind
                fi
        fi

        if [[ ${fieldposition} == "${DOCUMENT_NAME}" ]]
        then
        docname=${groupfieldvalue}
        docname="$(echo "$docname" | tr -d ' ')"
        fi

        if [[ ${fieldposition} == "${DOCUMENT_OFFSET}" ]]
        then
        docoff=${groupfieldvalue}
        fi

        if [[ ${fieldposition} == "${DOCUMENT_LENGTH}" ]]
        then
        doclen=${groupfieldvalue}
        fi

        if [[ ${fieldposition} == "${COMP_LENGTH}" ]]
        then
        complength=${groupfieldvalue}
        fi

        if [[ ${fieldposition} == "${COMP_OFFSET}" ]]
        then
        compoffset=${groupfieldvalue}
        fi

        filename="Decomp_${docname}_${compoffset}_${complength}.out"
        indexfilename="Decomp_${docname}_${compoffset}_${complength}.ind"
        filename=$(echo "${filename}" | tr -d '[[:space:]]')
        indexfilename=$(echo "${indexfilename}" | tr -d '[[:space:]]')
        currentfilename=$filename

        if [[ $previousfilename != $currentfilename ]]
        then
        newcompoffset=true

        fi

        mdfieldcount=${mdfieldcount}+1 #Increment the metadata field count to fetch the next value from the metadt file

        done

        echo "GROUP_OFFSET:${docoff}" >> ${index_dir}/afp${i}.ind
        echo "GROUP_LENGTH:${doclen}" >> ${index_dir}/afp${i}.ind
        echo "GROUP_FILENAME:${output_file_path}/${filename}" >> ${index_dir}/afp${i}.ind


        #debug purpose only

        if [[ $linenumber == 5000 ]]; then

        i=i+1
        linenumber=0
        echo  "CODEPAGE:850" >> ${index_dir}/afp${i}.ind

        fi


        #debug purpose only

       echo "finished processing for $linenumber"
       linenumber=linenumber+1


done < ${metadata_file_name}

log "removing the temp file containing the indexed fields"
rm -rf tempfile
rm -rf  ${index_dir}/afp*.ind

mv "${trigger_dir}/${trigger_file_prefix}.indexinprogress" "${trigger_dir}/${trigger_file_prefix}.indexed"

log "*************************************************************************************************"
log "********Script***completed**at***$(date '+%m/%d/%Y %H:%M:%S')*************************************"
log "*************************************************************************************************"

3. The attempts at a solution (include all code and scripts):

Included.

4. Complete Name of School (University), City (State), Country, Name of Professor, and Course Number (Link to Course):
Utkal University, IND.

Note: Without school/professor/course information, you will be banned if you post here! You must complete the entire template (not just parts of it).

Last edited by Sandeep Pattnai; 09-21-2015 at 01:52 PM..
Sponsored Links
    #2  
Old Unix and Linux 09-21-2015
jim mcnamara jim mcnamara is offline Forum Staff  
...@...
 
Join Date: Feb 2004
Last Activity: 23 January 2017, 9:35 PM EST
Location: NM
Posts: 10,933
Thanks: 475
Thanked 1,005 Times in 934 Posts
Please provide the information for #4 above - THANK YOU.

PS: you invoke ksh but seem to have some bash code in your example. It will not run.
The Following User Says Thank You to jim mcnamara For This Useful Post:
Sandeep Pattnai (09-21-2015)
Sponsored Links
    #3  
Old Unix and Linux 09-21-2015
Sandeep Pattnai Sandeep Pattnai is offline
Registered User
 
Join Date: Sep 2015
Last Activity: 21 September 2015, 8:00 PM EDT
Posts: 3
Thanks: 3
Thanked 0 Times in 0 Posts
Jim,

The code runs, but the performance is slow. The for loop inside the while is causing the issue. It would be great if u can provide an alternate approach to avoid this nested loop.
    #4  
Old Unix and Linux 09-21-2015
MadeInGermany MadeInGermany is offline Forum Advisor  
Registered User
 
Join Date: May 2012
Last Activity: 23 January 2017, 6:40 PM EST
Location: Simplicity
Posts: 3,369
Thanks: 259
Thanked 1,104 Times in 1,000 Posts
True, the code is not ksh. I guess there is

Code:
$ ls -l /usr/bin/ksh
... -> /bin/bash

#4 the School/University is still missing!
Sponsored Links
    #5  
Old Unix and Linux 09-21-2015
RudiC RudiC is offline Forum Staff  
Moderator
 
Join Date: Jul 2012
Last Activity: 23 January 2017, 4:06 PM EST
Location: Aachen, Germany
Posts: 10,164
Thanks: 230
Thanked 3,075 Times in 2,851 Posts
Some data samples might help. Wouldn't a performance / time profile make sense?
The Following User Says Thank You to RudiC For This Useful Post:
Sandeep Pattnai (09-21-2015)
Sponsored Links
    #6  
Old Unix and Linux 09-21-2015
Corona688 Corona688 is offline Forum Staff  
Mead Rotor
 
Join Date: Aug 2005
Last Activity: 23 January 2017, 5:46 PM EST
Location: Saskatchewan
Posts: 21,785
Thanks: 1,027
Thanked 4,043 Times in 3,750 Posts
It doesn't look like it's the loop that's the problem, to me. It's the creation of all those tiny files, and all those external tr -d calls, and the >> re-opening the same file over and over and over.
The Following User Says Thank You to Corona688 For This Useful Post:
Sandeep Pattnai (09-21-2015)
Sponsored Links
    #7  
Old Unix and Linux 09-21-2015
Sandeep Pattnai Sandeep Pattnai is offline
Registered User
 
Join Date: Sep 2015
Last Activity: 21 September 2015, 8:00 PM EDT
Posts: 3
Thanks: 3
Thanked 0 Times in 0 Posts
Hi Corona688/Rudi C,

As per your suggestion, I have changed the tr -d with the "sed" to remove spaces. But still I am seeing the same performance. Could you please suggest some alternate solution to this problem?

Thanks
Sponsored Links
Closed

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
two while nested loops SkySmart Shell Programming and Scripting 3 08-23-2012 10:17 PM
Nested for loops jacksolm Shell Programming and Scripting 13 08-12-2011 11:37 AM
KSH nested loops? mrice Shell Programming and Scripting 9 07-26-2011 09:17 PM
Korn Shell programming (FTP, LOOPS, GREP) jonesdk5 Shell Programming and Scripting 2 02-10-2010 03:51 PM
nested for loops taiL Shell Programming and Scripting 5 10-01-2009 12:02 AM



All times are GMT -4. The time now is 01:05 AM.