Sponsored Content
Top Forums Shell Programming and Scripting Finding the right file with multiple sort criteria Post 302911921 by LMHmedchem on Monday 4th of August 2014 01:34:38 PM
Old 08-04-2014
Finding the right file with multiple sort criteria

Hello,

I have files in a directory with names like,
Code:
./f0/84.40_E1200_85.39_E1300_f0_r00_1300-ON-0.25_S7A_v4_47.19.1.out.txt
./f0/84.40_E1200_85.83_E1200_f0_r00_1200-ON-0.25_S7A_v4_47.19.1.out.txt
./f0/84.60_E1100_86.45_E1100_f0_r00_1100-ON-0.25_S7A_v4_47.19.1.out.txt
./f0/85.20_E1000_87.26_E1000_f0_r00_1000-ON-0.25_S7A_v4_47.19.1.out.txt
./f0/86.42_E900_88.14_E900_f0_r00_900-ON-0.25_S7A_v4_47.19.1.out.txt
./f0/88.88_E800_90.07_E800_f0_r00_800-ON-0.25_S7A_v4_47.19.1.out.txt

I need to find the smallest value for the first '_' delimited field (84.40 in this case). Where there are more than one file with this value, like above, I need the one with the smallest value of the int in 1300-ON-0.25, 1200-ON-0.25, etc. For this example, I would want to find the file and assign that name to a bash variable.

84.40_E1200_85.83_E1200_f0_r00_1200-ON-0.25_S7A_v4_47.19.1.out.txt

There will never be more than 2 files with the same value for the field I am looking at.

Another caveat is that there are times where I would be looking at the float in the third '_' delimited field instead of the first. If $STOP_ON==T, I would sort on the third number and if $STOP_ON=V I would sort on the first.

I guess what I would do here is something like,
Code:
if [ "$STOP_ON" == "T" ]; then
   # this removes the path from the front of the filename, sorts real in position 3
   FILES=$(ls  $CURRENT_DIR'/'*'out.txt' | \
           awk 'BEGIN {FS="/"} {print $6}' | \
           sort -t_ -k 3 -n | \
           head -n 2)
fi
if [ "$STOP_ON" == "V" ]; then
   # this removes the path from the front of the filename, sorts real in position 1
   FILES=$(ls  $CURRENT_DIR'/'*'out.txt' | \
           awk 'BEGIN {FS="/"} {print $6}' | \
           sort -t_ -k 1 -n | \
           head -n 2)
fi

This would strip off the path, sort on the float that I need and then pick off the top two. I could then process the list to find the one with the lowest value for the second sorting criteria.

This seems rather involved and there is another problem in that there may not be two files with the same value for the first sort filed as there are in this example. That means I would first have to check the results of the above code for that as well.

It seems as if there should be a better way to do this. Please let me know if I have botched my explanation and want me to try again.

LMHmedchem
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

finding multiple file types with "-o"

i was just wondering if any one had a good example of finding mutliple file types with the -o option or any other alternatives. find . \( -name "*.txt" -o -name "*.tag" \) for some reason i'm not having much luck and the man page isn't very descriptive. what i am trying to do is find all... (6 Replies)
Discussion started by: Shakey21
6 Replies

2. Shell Programming and Scripting

Searching for multiple criteria in log files?

I would like a simple shell script that will allow me to display to screen all unsuccessful su attempts in my sulog file, for the present date. I have been trying several different combinations of commands, but I can't quite get the syntax correct. The mess I have right now (don't laugh) is... (4 Replies)
Discussion started by: Relykk
4 Replies

3. UNIX for Dummies Questions & Answers

Help needed to sort multiple columns in one file

Hi, I would like to know given that I have 3 columns. Let say I have first 3 columns to do operation and these operation output is printed out each line by line using AWK and associative array.Currently in the output file, I do a sort by -r for the operation output. The problem comes to... (1 Reply)
Discussion started by: ahjiefreak
1 Replies

4. UNIX for Dummies Questions & Answers

How to sort alphabetically after finding values

I have a list of people in a usage log and need to print the names and phone numbers of people with over 500 logins. I'd also like to display these names alphabetically. I have their total logins set to a variable named total. So far, I have very little in my awk script to do this: FS=":"... (4 Replies)
Discussion started by: doubleminus
4 Replies

5. Shell Programming and Scripting

Combine multiple string into 1 string group by certain criteria

Hi all, I am newbie in unix. Just have some doubts on how to join multiple lines into single line. I have 1 file with following contents. R96087641 HostName-kul480My This is no use any more %% E78343970 LocalPath-/app/usr/SG (Blank in this line) %% E73615740... (4 Replies)
Discussion started by: whchee
4 Replies

6. Shell Programming and Scripting

Help with egrep or grep command to meet multiple criteria

Hello, I"m a newbie :). I hope I can learn from the scripting expert. I'm trying to use egrep and grep commands to get the total count by meeting both criteria. So far, I haven't been able to do it. if robot = TLD and barcode = AA, then final count should be 2 if robot = TLD and... (9 Replies)
Discussion started by: MinBee
9 Replies

7. Shell Programming and Scripting

Finding multiple column values and match in a fixed length file

Hi, I have a fixed length file where I need to verify the values of 3 different fields, where each field will have a different value. How can I do that in a single step. (6 Replies)
Discussion started by: naveen_sangam
6 Replies

8. Shell Programming and Scripting

Finding multiple zero's in a file

Hi all, i have a text file like the below example--- 146 7600 147 23996 43024 50700581 28998 1767165 10 3784 12 1344 0 0 0 545 641 166646 723 90136 24 1046 46 2948 OR 4340 ... (15 Replies)
Discussion started by: gemnian.g
15 Replies

9. Shell Programming and Scripting

Globbling with multiple criteria (UNIX Shell)

I am new to UNIX Shell. I want to list the files names in the current directory that are not start with 'AB' and have at least two characters. For example, say I have those files in the current directory: AB, AC, AD, AE, B, C. After executing the command, AC, AD, AE will be listed on the screen. ... (6 Replies)
Discussion started by: Ray Sun
6 Replies

10. UNIX for Beginners Questions & Answers

How to sort file with certain criteria (bash)?

I am running a command that is part of a script and this is what I am getting when it is sorted by the command: command: ls /tmp/test/*NDMP*.z /tmp/test/CARS-GOLD-NET_CHROMJOB-01-XZ-ARCHIVE-NDMP.z /tmp/test/CARS-GOLD-NET_CHROMJOB-01-XZ-NDMP.z... (2 Replies)
Discussion started by: newbie2010
2 Replies
bup-margin(1)						      General Commands Manual						     bup-margin(1)

NAME
bup-margin - figure out your deduplication safety margin SYNOPSIS
bup margin [options...] DESCRIPTION
bup margin iterates through all objects in your bup repository, calculating the largest number of prefix bits shared between any two entries. This number, n, identifies the longest subset of SHA-1 you could use and still encounter a collision between your object ids. For example, one system that was tested had a collection of 11 million objects (70 GB), and bup margin returned 45. That means a 46-bit hash would be sufficient to avoid all collisions among that set of objects; each object in that repository could be uniquely identified by its first 46 bits. The number of bits needed seems to increase by about 1 or 2 for every doubling of the number of objects. Since SHA-1 hashes have 160 bits, that leaves 115 bits of margin. Of course, because SHA-1 hashes are essentially random, it's theoretically possible to use many more bits with far fewer objects. If you're paranoid about the possibility of SHA-1 collisions, you can monitor your repository by running bup margin occasionally to see if you're getting dangerously close to 160 bits. OPTIONS
--predict Guess the offset into each index file where a particular object will appear, and report the maximum deviation of the correct answer from the guess. This is potentially useful for tuning an interpolation search algorithm. --ignore-midx don't use .midx files, use only .idx files. This is only really useful when used with --predict. EXAMPLE
$ bup margin Reading indexes: 100.00% (1612581/1612581), done. 40 40 matching prefix bits 1.94 bits per doubling 120 bits (61.86 doublings) remaining 4.19338e+18 times larger is possible Everyone on earth could have 625878182 data sets like yours, all in one repository, and we would expect 1 object collision. $ bup margin --predict PackIdxList: using 1 index. Reading indexes: 100.00% (1612581/1612581), done. 915 of 1612581 (0.057%) SEE ALSO
bup-midx(1), bup-save(1) BUP
Part of the bup(1) suite. AUTHORS
Avery Pennarun <apenwarr@gmail.com>. Bup unknown- bup-margin(1)
All times are GMT -4. The time now is 11:35 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy