Split list of files into an array and pass to function


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Split list of files into an array and pass to function
# 8  
Old 01-13-2015
The code I posted above is working with one caviat. In the function code,
Code:
function run_stats_program {

   # function args
   SET_F=$1
   FOLD_F=$2
   START_ELEMENT_F=$3
   NUMBER_OF_ELEMENTS_F=$4
   
   # get list of stats input files in fold directory
   STATS_INPUT_FILENAMES_F=($(ls  './'$SET_F'/'$FOLD_F'/'*'in.txt'))

   # create file list as subest of 
   FILE_LIST=("${STATS_INPUT_FILENAMES_F[@]:$START_ELEMENT_F:$NUMBER_OF_ELEMENTS_F}")

   # get reference file name
   REFERENCE_FILE_F=$(ls './'$SET_F'/'$FOLD'/00_'$FOLD'_reference_'*'.txt')

   for INPUT_FILE in "${FILE_LIST[@]}"
   do
      # print current input file
      echo $INPUT_FILE
      #process stats input file
      './'$STATS_APP -r $REFERENCE_FILE_F -i $INPUT_FILE -l $BATCH_STOP_SUBSETS -s $BATCH_STOP_STATS -p $OA_PRINT_PRECISION -f $INPUT_FORMAT
      #delete stats input file
      rm -f $INPUT_FILE     
   done
}

my preference is to remove the files as they are processed as the code in red indicates. This cannot be done as currently implemented because STATS_INPUT_FILENAMES_F is generated in the function and if files are deleted, the size of the resulting array changes between function calls. This blows up the array ranges that the function is trying to select.

If I want to delete files as processed, it would appear that I would need to create the sub-lists outside of the function and then pass in the arrays. That puts me back to passing in the arrays as arguments or waiting on deletion.

Thoughts?

LMHmedchem
# 9  
Old 01-14-2015
As long as the file list array is not defined as a local variable in the parent shell, the subshells running the function don't need to redefine the array; it will be inherited.

Perhaps the following script with a revised version of your function and a new function that takes one operand specifying the number of invocations of your function to run concurrently, splits the list of files into subsets, invokes your function, and waits for all invocations to complete will provide a useful example:
Code:
#!/bin/bash
# Define functions...
function run_stats_program {

   # function args
   SET_F=$1
   FOLD_F=$2
   START_ELEMENT_F=$3
   NUMBER_OF_ELEMENTS_F=$4
   echo 'function run_stats_program called with args: ' "$@"
   
   # get reference file name
   #REFERENCE_FILE_F=$(ls './'$SET_F'/'$FOLD_F'/00_'$FOLD_F'_reference_'*'.txt')
   REFERENCE_FILE_F=Reference

   for INPUT_FILE in "${STATS_INPUT_FILENAMES[@]:START_ELEMENT_F:NUMBER_OF_ELEMENTS_F}"
   do
      # print current input file
      echo $INPUT_FILE
      #process stats input file
      echo './'$STATS_APP -r $REFERENCE_FILE_F -i $INPUT_FILE -l $BATCH_STOP_SUBSETS -s $BATCH_STOP_STATS -p $OA_PRINT_PRECISION -f $INPUT_FORMAT
      #delete stats input file
      echo rm -f $INPUT_FILE     
      sleep 1
   done
}

function split_and_run {
	NGROUPS="$1"

	# get number of files
	NUM_INPUT_FILES=${#STATS_INPUT_FILENAMES[@]}

	# Calculate number of files to be sent to each invocation of
	# run_stats_program..
	BASE_LIST_SIZE=$((NUM_INPUT_FILES / NGROUPS))
	LEFTOVER=$((NUM_INPUT_FILES % NGROUPS))
	SPLIT_START=0

	# Run NGROUPS copies of run_state_program asynchronously...
	for ((n = 1; n <= NGROUPS; n++)) {
		GROUP_SIZE=$((BASE_LIST_SIZE + (LEFTOVER >= n)))
		run_stats_program "$SET" "$FOLD" $SPLIT_START $GROUP_SIZE&
		sleep 2
		SPLIT_START=$((SPLIT_START + GROUP_SIZE))
	}
	# Wait for run_state_program invocations to finish...
	wait
}

# Initialize variables:
BATCH_STOP_STATS='batch_stop_stats_value'
BATCH_STOP_SUBSETS='batch_stop_subsets_value'
FOLD='fold_value'
INPUT_FORMAT='input_format_value'
OA_PRINT_PRECISION='oa_print_precision_value'
SET='set_value'
STATS_APP='stats_app_value'

# Collect list of file names
#STATS_INPUT_FILENAMES=($(ls  './'$SET'/'$FOLD'/'*'in.txt'))
STATS_INPUT_FILENAMES=(a b c d e f g h i j k l m n o p q r s t u v w x y z)

# Test run for dual processor system...
split_and_run 2
echo '*** 1st set done ***'
sleep 5
# Test run for quad processor system...
split_and_run 4
echo '*** 2nd set done ***'
sleep 5
# Test run for dual quad processor system...
split_and_run 8
echo '*** 3rd set done ***'

Note that I changed a couple of references to $FOLD in your function to instead refer to $FOLD_F. It isn't obvious to me whether $FOLD and $SET will be the same in all of your function calls or not. If they will be the same, you can probably drop the 1st two operands to your function and just inherit the values of $FOLD and $SET from the invoking shell. Similarly, if the reference file is the same in all invocations of your function, you can set it once in the invoking shell instead of duplicating that processing in each function invocation.

Note that if you might run this with fewer files than the number of concurrent invocations, you'll probably want to change:
Code:
		run_stats_program "$SET" "$FOLD" $SPLIT_START $GROUP_SIZE&
		sleep 2

to something more like:
Code:
		if [ $GROUP_SIZE -gt 0 ]
		then	run_stats_program "$SET" "$FOLD" $SPLIT_START $GROUP_SIZE&
			sleep 2
		fi

If this sample code looks like it is doing what you want, remove (or replace) the code in red to use your real data.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to pass and read an array in ksh shell script function.?

I'm able to read & print an array in varaible called "filelist" I need to pass this array variable to a function called verify() and then read and loop through the passed array inside the function. Unfortunately it does not print the entire array from inside the funstion's loop. #/bin/ksh... (5 Replies)
Discussion started by: mohtashims
5 Replies

2. Shell Programming and Scripting

Pass an array to awk to sequentially look for a list of items in a file

Hello, I need to collect some statistical results from a series of files that are being generated by other software. The files are tab delimited. There are 4 different sets of statistics in each file where there is a line indicating what the statistic set is, followed by 5 lines of values. It... (8 Replies)
Discussion started by: LMHmedchem
8 Replies

3. Shell Programming and Scripting

Pass array to a function and display the array

Hi All I have multiple arrays like below. set -A val1 1 2 4 5 set -A val2 a b c d . . . Now i would like to pass the individual arrays one by one to a function and display/ do some action. Note : I am using ksh Can you please advise any solution... Thanks in advance. (7 Replies)
Discussion started by: Girish19
7 Replies

4. Shell Programming and Scripting

Question about sorting -- how to pass an array to a function

Hi, guys I just wanted to sort the elements of an array ascendingly. I know the following code does work well: array=(13 435 8 23 100) for i in {0..4} do j=$((i+1)) while ] do if } -le ${array} ]] then : else min=${array} ${array}=${array} ${array}=$min fi... (5 Replies)
Discussion started by: franksunnn
5 Replies

5. Shell Programming and Scripting

How to pass an array to a function in shell script.?

hi, I have a array say SAP_ARRAY="s1.txt" SAP_ARRAY="s2.txt" how can i pass this full array to a function. here is the sample code i am using.. CHECK_NO_FILES() { FARRAY=$1 echo "FARRAY = $FARRAY" echo "FARRAY = $FARRAY" ............... (5 Replies)
Discussion started by: Little
5 Replies

6. Shell Programming and Scripting

Split the file and access that files through array and loop

Hi All, the below is my requirement.. i need to split the file based on line and put that files in a array and need to access that files through loop finally i should send the files through mail.. how can we achieve this ..I am new to shell script please guide me.. I am using KSH.. ... (11 Replies)
Discussion started by: kalidoss
11 Replies

7. Shell Programming and Scripting

Find and split the list of files with suffiz of seg**

Hi,. I am writing a script to get the new files and split them. Requirement Find the new files under the path "/wload/scmp/app/data/OAS" (There are 5 sub folders). Gunzip the files which are having .gz suffix. Put the list of files in the filename in the format... (0 Replies)
Discussion started by: Satish Shettar
0 Replies

8. Shell Programming and Scripting

How to pass an array from SHELL to C function

Hi, I have an output generated from a shell script like; 0x41,0xF2,0x59,0xDD,0x86,0xD3,0xEF,0x61,0xF2 How can I pass this value to the C function, as below; int main(int argc, char *argv) { unsigned char hellopdu={above value}; } Regards Elthox (1 Reply)
Discussion started by: elthox
1 Replies

9. Shell Programming and Scripting

Array split function & hashes

Hi, If this is the array that is being returned to me: How would I get the values for each of the 3 records? This works for 1 Record: foreach $item (@results) { ($id, $id2, $name, $date, $email) = split(/\|/, $item, 5); print "$name<br>"; } (2 Replies)
Discussion started by: novera
2 Replies

10. Shell Programming and Scripting

Can we pass array with call by value in function

I want to pass an array in my function, And my function will be changing the elements of the array in the fuction, but it should not affect the values in my array variable of main function (1 Reply)
Discussion started by: ranjithpr
1 Replies
Login or Register to Ask a Question