Process a specific number of files ina list

Process a specific number of files ina list


I have a list of files that was created with,


I am doing a loop on this list


but I may not want to process everything. Is there a simple way to just process the first 5,10,n, etc in this list.

I process each file name with awk to extract two numbers from each file name,
#remove path from filename
FILENAME=`echo $INPUT | awk 'BEGIN {FS="/"} {print $5}'`

# store random ini set
RAND_SET=`echo $FILENAME | awk 'BEGIN {FS="_"} {print $5}'`

# find MAE min epoch values
MAE_EPOCH=`echo $FILENAME | awk 'BEGIN {FS="_"} {print $2}'`

#remove leading E
#create value at -100 epochs

I end up with the numbers $MAE_EPOCH and $RAND_SET for each file. What I would really like to do is to scan all the files, extract the MAE_EPOCH, and then process the best some number, like the top 5 (based on the lowest values for MAE_EPOCH). I need to know the value of RAND_SET associated with each MAE_EPOCH value.

File names look like,

The text in blue is what I am capturing. I would need both the MAE_EPOCH and the corresponding RAND_SET value for each file I am going to process. I guess I would loop on the file set and then store the data for the files I need to process, but I'm not so sure how to do that kind of thing in bash.

Help would be greatly appreciated,


If you cd into the directory with the files, you can craft a single pipeline to do the work (at least, as I understood it): ls to generate a list, grep to filter by name, sort to numerically sort by MAE_EPOCH, head to limit the number of results, and awk to extract whatever portions of the underscore-delimited records are needed.

The output of that pipeline can then be fed to a while-read loop for processing.

At one point, I had this set up to process the first few files from


but when the file names looked like,


The file starting with 100.22_E100 got processed first, presumably since 1 is smaller than 9. As I think about it, I described it wrong. It is the first field (the real number) that I would wand to sort on and then retrieve the other numbers. For the files above, if I wanted the top two results I would first look at the number in the first field,


Based on the values, I would want to extract,

MAE_EPOCH=100, RAND_SET=26 for the top file
MAE_EPOCH=100, RAND_SET=17 for the second file

These are the numbers I need to pass to the next program.

Is there some reason to use grep to filter the name instead of doing ls *.out.txt?

I guess I would be using some combination of -t -k -n with sort, like

ls *.out.txt | sort -t_ -k 1 -n | head -n 2 | awk

This would give me the top two files in the list above?

I'm sure I will have to play around with this, but thanks for the head start. I am a bit unclear on how to pass the result of the pipe into my loop. Do you have a link for an example of something like that?


Well I have it working with this,


cd ./the_folder_with_the_files

# find all files .out.txt and sort on the real number in the first position
# return the file names for the top $NUMBER_TO_PROCESS in sorted list
FILES=$(ls *.out.txt | sort -t_ -k 1 -n | head -n $NUMBER_TO_PROCESS)

# loop on all file names returned

   # current file
   echo $INPUT

   # store random ini set
   RAND_SET=`echo $INPUT | awk 'BEGIN {FS="_"} {print $5}'`
   echo "RAND_SET"  $RAND_SET

   # find MAE min epoch value
   MAE_EPOCH=`echo $INPUT | awk 'BEGIN {FS="_"} {print $2}'`
   # remove leading E
   # backup if specified


This gives the behavior that I am looking for.

I left the awk stuff in the loop since there are a couple of items in the file name string that need to be retrieved and assigned to variables. Does this make sense?

Is there some reason to filter the file list with grepinstead of using the glob?


