A simpler way to do this (save a list of files based on part of their name)
Hello,
I have a script that checks every file with a specific extension in a specific directory. The file names contain some numerical output and I am recording the file names with the best n outcomes.
The script finds all files in the directory with the extension .out.txt and uses awk to parse the filename on underscore. In this case, I am reading the first field and looking for the smallest three values across the set of files. In other cases, I may be reading the third field. I understand that in this simple case, all I would have to do is take the first three files, but there will be other cases where that would not work.
This is the script at this point and there is sample input in the attached zip. The input file names look like,
My main question is about how to keep a running record of the file names of the best three values as I loop through the file names. This script does it by brute force and works alright, but I may need to save the top 20 or 50, and I don't look forward to coding that up with the method I used above.
Seems like a egrep would work where the output of your grep would include the filename and the particular field you wanted if the value you're interested in is actually in the file. Then you would sort by numeric value on that particular field, than use head or tail depending upon your sort and boom...done. I am not clear on if you're using the filenames to extract the values yet, but in any case it will be similar, I will look at your data and script and an example shortly. Someone will probably post a solution if I don't in a short time.
---------- Post updated at 12:50 PM ---------- Previous update was at 12:40 PM ----------
Based on filename approach...
Something like this
If I was doing this in cpp, I would definitely use some kind of sort, but I'm not at all familiar with how to do this in a shell. The key value is in the file, but not somewhere where it can be easily found (not in the same place in every file). I have already processed these files and added the value I am interested in to the file name so it will be easier to access. It's easy enough to grab the value out of the filename, but I don't know if that's compatible with your solution.
Based on the data in your zip file and your current bash script, here is another bash script that seems to do what you want, but instead of hard coding the directory, field number, and number of files to be listed, it takes them as parameters:
Code:
#!/bin/bash
IAm=${0##*/}
Usage="Usage: $IAm directory field_number count"
if [ $# -ne 3 ] || ! cd "$1" > /dev/null || [ "$2" != "${2%*[^0-9]*}" ] ||
[ "$3" != "${3%*[^0-9]*}" ]
then echo "$Usage"
exit 1
fi
ls *.out.txt | sort -t_ -k$2,$2n | awk -F_ -v f=$2 -v c=$3 '
NR > c {exit}
{ if(NR == 1) s = "st"
else if(NR == 2) s = "nd"
else if(NR == 3) s = "rd"
else s = "th"
printf("%d%s EV file\n%s\nEV MAE %d %s\n\n", NR, s, $0, NR - 1, $f)
}'
This script was tested using both bash and ksh, but should work with any POSIX conforming shell.
If you save this in a file named test2_copy.sh, make it executable with:
Code:
chmod +x test2_copy.sh
and execute it with:
Code:
./test2_copy.sh f0 1 3
you get the same output as you get if you run ./test_copy.sh:
Code:
1st EV file
48.93_E3200_55.94_E1900_34_ri_OA_f0_S1A_v17_52.26.1_4_ON_0.25lr.out.txt
EV MAE 0 48.93
2nd EV file
49.15_E2700_51.98_E1200_32_ri_OA_f0_S1A_v17_52.26.1_4_ON_0.25lr.out.txt
EV MAE 1 49.15
3rd EV file
49.16_E1600_52.54_E1600_44_ri_OA_f0_S1A_v17_52.26.1_4_ON_0.25lr.out.txt
EV MAE 2 49.16
but you can also run it with:
Code:
./test2_copy.sh f0 3 5
to produce:
Code:
1st EV file
50.62_E1700_51.92_E300_8_ri_OA_f0_S1A_v17_52.26.1_4_ON_0.25lr.out.txt
EV MAE 0 51.92
2nd EV file
49.15_E2700_51.98_E1200_32_ri_OA_f0_S1A_v17_52.26.1_4_ON_0.25lr.out.txt
EV MAE 1 51.98
3rd EV file
49.16_E1600_52.54_E1600_44_ri_OA_f0_S1A_v17_52.26.1_4_ON_0.25lr.out.txt
EV MAE 2 52.54
4th EV file
50.36_E3400_55.09_E3000_35_ri_OA_f0_S1A_v17_52.26.1_4_ON_0.25lr.out.txt
EV MAE 3 55.09
5th EV file
48.93_E3200_55.94_E1900_34_ri_OA_f0_S1A_v17_52.26.1_4_ON_0.25lr.out.txt
EV MAE 4 55.94
which gives you data sorted on the 3rd underscore delimited field and limited to the 1st 5 matching files. The color was added only to highlight the sort field; the actual output will not have red text.So, you could sort on the 5th field with:
Code:
./test2_copy.sh f0 5 5
to get:
Code:
1st EV file
50.62_E1700_51.92_E300_8_ri_OA_f0_S1A_v17_52.26.1_4_ON_0.25lr.out.txt
EV MAE 0 8
2nd EV file
49.15_E2700_51.98_E1200_32_ri_OA_f0_S1A_v17_52.26.1_4_ON_0.25lr.out.txt
EV MAE 1 32
3rd EV file
48.93_E3200_55.94_E1900_34_ri_OA_f0_S1A_v17_52.26.1_4_ON_0.25lr.out.txt
EV MAE 2 34
4th EV file
50.36_E3400_55.09_E3000_35_ri_OA_f0_S1A_v17_52.26.1_4_ON_0.25lr.out.txt
EV MAE 3 35
5th EV file
49.16_E1600_52.54_E1600_44_ri_OA_f0_S1A_v17_52.26.1_4_ON_0.25lr.out.txt
EV MAE 4 44
Note, however, that it is doing a numeric sort, so the results are unspecified if you select a field that isn't entirely a number.
This User Gave Thanks to Don Cragun For This Post:
Thanks, I will go over this and see if I can get it working. At the end of the day, I will be doing a cp of each file in the list to another directory. One of the problems I have is that I will probably want the top 20 out of 50 or so (not the top 3), so you can see why my method wasn't going to be practical.
It's not entirely clear to me what arguments 2 and 3 are. I argument 3 the number of files being processed and argument 2 the field being sorted on?
Thanks, I will go over this and see if I can get it working. At the end of the day, I will be doing a cp of each file in the list to another directory. One of the problems I have is that I will probably want the top 20 out of 50 or so (not the top 3), so you can see why my method wasn't going to be practical.
It's not entirely clear to me what arguments 2 and 3 are. I argument 3 the number of files being processed and argument 2 the field being sorted on?
LMHmedchem
I'm sorry for not explaining it better. I thought the usage message comment was sufficient documentation along with the examples I gave. The arguments are:
A pathname of the directory containing the files to be processed.
where field14 always ends with the string .out.txt. I showed you examples using fields 1, 3, and 5 as the sort key since they were the only numeric fields in the names of the files you used in your example that had values that were not a constant. The 12th field was numeric but all filenames had 4 in field12 so sorting on it didn't seem useful.
The count (3rd operand) in my examples was 3 and 5 since you used 3 in your example and you only had 5 files in your example. You can put any number you want there to specify the number of files you want listed. It is happy with 1; it is happy with 32000. Pick the number you want.
After spending some more time looking through this, you did explain it quite well. I just didn't read it as well as you explained it.
I am in the process of trying to copy the files that are found by this to a different location and not having much success. Probably the best solution would be to dump the sorted list into a bash array. Then I can do all the rest I need to do.
This is my attempt to do this (I didn't include that parsing and exception code here but will post the entire working script, once it is...)
This is mildly successful in that is does capture the file names in an array, but all of them are in the first array element. I suppose I could parse array[0] on out.txt, or something kludgey like that, but I am guessing there is a better way.
I know there are some ways to copy in awk, and also with system, but I need to extract some additional information from the filename to locate an additional file, and the only way I know how to do that is in bash.
Hi,
I have a directory with a lot of files like this:
a.bam
b.bam
c.bam
I like to rename these files based on a list where the name of the files in the first column will be replasced by the names in the second column. Here is my list which is a tab-delimited text file:
a x
b y
c ... (4 Replies)
I am trying to modify the "corestat v1.1" code which is in Perl.The typical output of this code is below:
Core Utilization
CoreId %Usr %Sys %Total
------ ----- ----- ------
5 4.91 0.01 4.92
6 0.06 ... (0 Replies)
Hello,
I have a huge directory (with millions of files) and need to find out duplicates based on BOTH file name and File size.
I know fdupes but it calculates MD5 which is very time-consuming and especially it takes forever as I have millions of files.
Can anyone please suggest a script or... (7 Replies)
Hi Gurus,
I need to list only the files with out certain extension.
For eg from the following list of files:
I need to only list:
Thanks
Shash (7 Replies)
Hi
i have a file which has mutiple line in it.
inside that i have a pattern similar to this
/abc/def/hij
i want to fine the pattern starting with "/" and get the first word in between the the symbols "/" i.e. "abc" in this case into a variable.
thanks in advance (13 Replies)
Please see how can I do this:
File A (three columns):
X1,Y1,1.01
X2,Y2,2.02
X3,Y3,4.03
File B (three columns):
X1,Y1,1
X2,Y2,2
X3,Y3,4.0005
Now I have to compare file A and B based on the integer part of column 3. Means first 2 rows should be OK and the third row should not satisfy... (12 Replies)
Hi,
I have list of files as following:
/home/abc/x/23344.php
/home/axx/zz/ddddd/abc/7asda/2434.php
/home/zzz/7x/y/114.php
/home/assssc/x/yasyday/23664.php
( last part in each line is <somenumber.php>
I need to somehow get this from the above:
/home/abc/x/... (6 Replies)
I have the files logged in the file system with names in the format of : filename_ordernumber_date_time
eg:
file_1_12012007_1101.txt
file_2_12022007_1101.txt
file_1_12032007_1101.txt
I need to find out all the files that are logged multiple times with same order number. In the above eg, I... (1 Reply)
Hello all
im using allot with the method of getting file list from misc place in unix and copy them into text file
and then doing misc action on this list of files using
foreach f (`cat file_list.txt`)
do something with $f
end
can I replace this file_list.txt with some place in memory? ... (1 Reply)