Split list of files into an array and pass to function


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Split list of files into an array and pass to function
# 1  
Old 01-13-2015
Split list of files into an array and pass to function

There are two parts to this. In the first part I need to read a list of files from a directory and split it into 4 arrays. I have done that with the following code,
Code:
# collect list of file names
STATS_INPUT_FILENAMES=($(ls  './'$SET'/'$FOLD'/'*'in.txt'))
# get number of files
NUM_INPUT_FILES=${#STATS_INPUT_FILENAMES[@]}


# get size of each subset
PROC_SIZE=$((NUM_INPUT_FILES / 4))
# create array start and stop positions
let "START2 = $PROC_SIZE+1"; let "STOP2 = $START2+$PROC_SIZE"; 
let "START3 = $STOP2+1"; let "STOP3 = $START3+$PROC_SIZE";
let "START4 = $STOP3+1"

# create 4 arrays, each wiht 25% of filenames
NUM_FILE_LISTS='4'
FILE_LIST_0=("${STATS_INPUT_FILENAMES[@]:0:$PROC_SIZE}")
FILE_LIST_1=("${STATS_INPUT_FILENAMES[@]:$START2:$STOP2}")
FILE_LIST_2=("${STATS_INPUT_FILENAMES[@]:$START3:$STOP3}")
FILE_LIST_3=("${STATS_INPUT_FILENAMES[@]:$START4:$NUM_INPUT_FILES}")

This is not very elegant, but I think it has the list split up

Next, I need to pass each of the 4 lists to a bash function, but I can't seem to find a reasonable syntax for doing that. Suggestions would be appreciated.

LMHmedchem
# 2  
Old 01-13-2015
It may have the list split up, but the 4 lists are no where close to containing the same number of elements. The construct grabbing a subset of the array elements in ksh93 and recent versions of bash is not:
Code:
${arrary[@]:start_index:end_index}

it is:
Code:
${arrary[@]:start_index:number_of_elements}

If we had a list of 10 files named 1 through 10 the four lists created by your code would be:
Code:
2:1 2
5:4 5 6 7 8
4:7 8 9 10
1:10

where the number before the colon is the number of files in the list and the numbers after the colon are the files in that list. (Note that files 7, 8, and 10 are in two lists and file 3 isn't in any list, and the number of files in the lists are 2, 5, 4, and 1.)

To get more even lists (and get each file in exactly one of your four lists), you could try something more like:
Code:
# collect list of file names
STATS_INPUT_FILENAMES=($(ls  './'$SET'/'$FOLD'/'*'in.txt'))
STATS_INPUT_FILENAMES=(1 2 3 4 5 6 7 8 9 10) # For testing only.

# get number of files
NUM_INPUT_FILES=${#STATS_INPUT_FILENAMES[@]}

# create 4 arrays, each with ~25% of filenames
NUM_FILE_LISTS='4'
# get size of each subset
BASE_LIST_SIZE=$(((NUM_INPUT_FILES) / NUM_FILE_LISTS))
LEFTOVER=$((NUM_INPUT_FILES % NUM_FILE_LISTS))
LIST_SIZE0=$((BASE_LIST_SIZE + (LEFTOVER > 0)))
LIST_SIZE1=$((BASE_LIST_SIZE + (LEFTOVER > 1)))
LIST_SIZE2=$((BASE_LIST_SIZE + (LEFTOVER > 2)))

FILE_LIST_0=("${STATS_INPUT_FILENAMES[@]:0:$LIST_SIZE0}")
FILE_LIST_1=("${STATS_INPUT_FILENAMES[@]:$LIST_SIZE0:$LIST_SIZE1}")
FILE_LIST_2=("${STATS_INPUT_FILENAMES[@]:$((LIST_SIZE0 + LIST_SIZE1)):$LIST_SIZE2}")
FILE_LIST_3=("${STATS_INPUT_FILENAMES[@]:$((LIST_SIZE0 + LIST_SIZE1 + LIST_SIZE2))}")

echo ${#FILE_LIST_0[@]}:${FILE_LIST_0[@]}
echo ${#FILE_LIST_1[@]}:${FILE_LIST_1[@]}
echo ${#FILE_LIST_2[@]}:${FILE_LIST_2[@]}
echo ${#FILE_LIST_3[@]}:${FILE_LIST_3[@]}

Which with the same list of 10 files produces the output:
Code:
3:1 2 3
3:4 5 6
2:7 8
2:9 10

Passing arrays to a function is tricky. The easier approach is to pass any fixed arguments as the 1st arguments to your functions and pass the filenames as a variable argument list with "${FILE_LIST_x[@]}".

Hope this helps...
This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 01-13-2015
Could this help you ?

Code:
#!/bin/sh

print_output () {
  myArray=$1
  eval echo \${$myArray[*]}
}

cd /path/to/yourdir
ls  | paste  - - - - | while read line
do
    eval FileArray=("${line}")
    print_output FileArray
done

This User Gave Thanks to pravin27 For This Post:
# 4  
Old 01-13-2015
Quote:
Originally Posted by Don Cragun
Passing arrays to a function is tricky. The easier approach is to pass any fixed arguments as the 1st arguments to your functions and pass the filenames as a variable argument list with "${FILE_LIST_x[@]}".
I think that I am going to avoid passing the array for now and see how it goes. I can pass LIST_SIZE0 and LIST_SIZE* and let the function create each sub list. This will mean repeating STATS_INPUT_FILENAMES=($(ls './'$SET'/'$FOLD'/'*'in.txt')) for each function call, but I will put up with that for now.

I guess I misunderstood the syntax for grabbing part of an array. The most important issue here is making sure that each file is on exactly one list. The second priority is making the lists as even as possible.

LMHmedchem
# 5  
Old 01-13-2015
I can see two ways to pass an array to a function, at least for my bash 4.3.30:
- pass the element count and then the elements scr1.sh A B ${#LIST[@]} ${LIST[@] C D }, run a for loop to assign to a local array
- pass the array like scr1.sh A B "${LIST[*]}" C D ; define local array like ARR=($3)
# 6  
Old 01-13-2015
This is what I have set up instead of passing the array.

calling code
Code:
# the number of availalbe cores
if [ "$CORES" == "quad" ]; then

   # create 4 arrays, each with ~25% of filenames
   NUM_FILE_LISTS='4'
   PROCESSED='0'

   # get size of each subset
    BASE_LIST_SIZE=$(((NUM_INPUT_FILES) / NUM_FILE_LISTS))
   LEFTOVER=$((NUM_INPUT_FILES % NUM_FILE_LISTS))

   # set up start elements and number of elements for all lists
   # list 0
   START_ELEMENT_0='0'
   NUMBER_OF_ELEMENTS_0=$((BASE_LIST_SIZE + (LEFTOVER > 0)))
   # keep track of number of files processed
   let "PROCESSED=$PROCESSED+$NUMBER_OF_ELEMENTS_0"
      
   # list 1   
   START_ELEMENT_1=$PROCESSED
   #let "START_ELEMENT_1=$START_ELEMENT_0+$NUMBER_OF_ELEMENTS_0"
   NUMBER_OF_ELEMENTS_1=$((BASE_LIST_SIZE + (LEFTOVER > 1)))
   let "PROCESSED=$PROCESSED+$NUMBER_OF_ELEMENTS_1"
 
   # list 2  
   START_ELEMENT_2=$PROCESSED
   NUMBER_OF_ELEMENTS_2=$((BASE_LIST_SIZE + (LEFTOVER > 2)))
   # keep track of number of files processed
   let "PROCESSED=$PROCESSED+$NUMBER_OF_ELEMENTS_2"
 
   # list 3  
   START_ELEMENT_3=$PROCESSED
   # assign the rest to this list
   let "NUMBER_OF_ELEMENTS_3=$NUM_INPUT_FILES-$PROCESSED"
   # keep track of number of files processed
   let "PROCESSED=$PROCESSED+$NUMBER_OF_ELEMENTS_3"

      # call functions to process stats
      run_stats_program  $SET  $FOLD  $START_ELEMENT_0  $NUMBER_OF_ELEMENTS_0 &
      # to prevent terminal overrun
      sleep 2
      run_stats_program  $SET  $FOLD  $START_ELEMENT_1  $NUMBER_OF_ELEMENTS_1 &
      sleep 2
      run_stats_program  $SET  $FOLD  $START_ELEMENT_2  $NUMBER_OF_ELEMENTS_2 &
      sleep 2
      run_stats_program  $SET  $FOLD  $START_ELEMENT_3  $NUMBER_OF_ELEMENTS_3 &
      sleep 2
      # wait untill subshells have returned
      wait

 fi

called function
Code:
function run_stats_program {

   # function args
   SET_F=$1
   FOLD_F=$2
   START_ELEMENT_F=$3
   NUMBER_OF_ELEMENTS_F=$4
   
   # get list of stats input files in fold directory
   STATS_INPUT_FILENAMES_F=($(ls  './'$SET_F'/'$FOLD_F'/'*'in.txt'))
 
   # create file list as subest of STATS_INPUT_FILENAMES_F
   FILE_LIST=("${STATS_INPUT_FILENAMES_F[@]:$START_ELEMENT_F:$NUMBER_OF_ELEMENTS_F}")

   for INPUT_FILE in "${FILE_LIST[@]}"
   do
      echo $INPUT_FILE
   done
}

All this does at this point is print the filenames. In the end, this will process the 4 file lists in 4 subshells. Processing involved calling a c++ widget to process each file. This setup allows 4 instances of the c++ app to run simultaneously and use availalble CPU resources. There will be a similar code block for hex core.

I get this this is written in long form at the moment. It would be nice for the code to be a bit more compact and elegant, but I don't see a clear way to put the function calls in a loop or something like that.

LMHmedchem

Last edited by LMHmedchem; 01-13-2015 at 05:52 PM..
# 7  
Old 01-13-2015
Ok, not the nicest, but it seems to work:
Code:
#!/bin/bash
ARRAY_ORGINAL=("${@}")
declare -a ARRAY1 ARRAY2 ARRAY3 ARRAY4
TOTAL=${#ARRAY_ORGINAL[@]}
MAX=$((  $TOTAL / 4 ))

count=0
ARRAY1=( ${ARRAY_ORGINAL[@]:$count:$MAX} )
n=0
while [[ $n -le $MAX ]]
do	#set -x
	unset ARRAY_ORGINAL[$n]
	n=$(($n+1))
done

count=$(($count+$MAX))
ARRAY2=( ${ARRAY_ORGINAL[@]:$count:$MAX} )
n=0
while [[ $n -le $MAX ]]
do	unset ARRAY_ORGINAL[$n]
	n=$(($n+1))
done

count=$(($count+$MAX))
ARRAY3=( ${ARRAY_ORGINAL[@]:$count:$MAX} )
n=0
while [[ $n -le $MAX ]]
do	unset ARRAY_ORGINAL[$n]
	n=$(($n+1))
done

count=$(($count+$MAX))
ARRAY4=( ${ARRAY_ORGINAL[@]:$count:$MAX} )
n=0
while [[ $n -le $MAX ]]
do	unset ARRAY_ORGINAL[$n]
	n=$(($n+1))
done

echo "1 : ${ARRAY1[@]}"
echo "2 : ${ARRAY2[@]}"
echo "3 : ${ARRAY3[@]}"
echo "4 : ${ARRAY4[@]}"

Code:
sh test.sh  a b c d e f g h i j k l
1 : a b c
2 : e f g
3 : g h i
4 : j k l

Left overs (as in, provided argument list is not 'equaly' dividable by 4) are not handled here.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to pass and read an array in ksh shell script function.?

I'm able to read & print an array in varaible called "filelist" I need to pass this array variable to a function called verify() and then read and loop through the passed array inside the function. Unfortunately it does not print the entire array from inside the funstion's loop. #/bin/ksh... (5 Replies)
Discussion started by: mohtashims
5 Replies

2. Shell Programming and Scripting

Pass an array to awk to sequentially look for a list of items in a file

Hello, I need to collect some statistical results from a series of files that are being generated by other software. The files are tab delimited. There are 4 different sets of statistics in each file where there is a line indicating what the statistic set is, followed by 5 lines of values. It... (8 Replies)
Discussion started by: LMHmedchem
8 Replies

3. Shell Programming and Scripting

Pass array to a function and display the array

Hi All I have multiple arrays like below. set -A val1 1 2 4 5 set -A val2 a b c d . . . Now i would like to pass the individual arrays one by one to a function and display/ do some action. Note : I am using ksh Can you please advise any solution... Thanks in advance. (7 Replies)
Discussion started by: Girish19
7 Replies

4. Shell Programming and Scripting

Question about sorting -- how to pass an array to a function

Hi, guys I just wanted to sort the elements of an array ascendingly. I know the following code does work well: array=(13 435 8 23 100) for i in {0..4} do j=$((i+1)) while ] do if } -le ${array} ]] then : else min=${array} ${array}=${array} ${array}=$min fi... (5 Replies)
Discussion started by: franksunnn
5 Replies

5. Shell Programming and Scripting

How to pass an array to a function in shell script.?

hi, I have a array say SAP_ARRAY="s1.txt" SAP_ARRAY="s2.txt" how can i pass this full array to a function. here is the sample code i am using.. CHECK_NO_FILES() { FARRAY=$1 echo "FARRAY = $FARRAY" echo "FARRAY = $FARRAY" ............... (5 Replies)
Discussion started by: Little
5 Replies

6. Shell Programming and Scripting

Split the file and access that files through array and loop

Hi All, the below is my requirement.. i need to split the file based on line and put that files in a array and need to access that files through loop finally i should send the files through mail.. how can we achieve this ..I am new to shell script please guide me.. I am using KSH.. ... (11 Replies)
Discussion started by: kalidoss
11 Replies

7. Shell Programming and Scripting

Find and split the list of files with suffiz of seg**

Hi,. I am writing a script to get the new files and split them. Requirement Find the new files under the path "/wload/scmp/app/data/OAS" (There are 5 sub folders). Gunzip the files which are having .gz suffix. Put the list of files in the filename in the format... (0 Replies)
Discussion started by: Satish Shettar
0 Replies

8. Shell Programming and Scripting

How to pass an array from SHELL to C function

Hi, I have an output generated from a shell script like; 0x41,0xF2,0x59,0xDD,0x86,0xD3,0xEF,0x61,0xF2 How can I pass this value to the C function, as below; int main(int argc, char *argv) { unsigned char hellopdu={above value}; } Regards Elthox (1 Reply)
Discussion started by: elthox
1 Replies

9. Shell Programming and Scripting

Array split function & hashes

Hi, If this is the array that is being returned to me: How would I get the values for each of the 3 records? This works for 1 Record: foreach $item (@results) { ($id, $id2, $name, $date, $email) = split(/\|/, $item, 5); print "$name<br>"; } (2 Replies)
Discussion started by: novera
2 Replies

10. Shell Programming and Scripting

Can we pass array with call by value in function

I want to pass an array in my function, And my function will be changing the elements of the array in the fuction, but it should not affect the values in my array variable of main function (1 Reply)
Discussion started by: ranjithpr
1 Replies
Login or Register to Ask a Question