Find the nth value in a list

09-12-2014

Registered User

362, 16

Join Date: Mar 2010

Last Activity: 3 March 2020, 10:38 PM EST

Location: Boston

Posts: 362

Thanks Given: 193

Thanked 16 Times in 15 Posts

Find the nth value in a list

Hello,

I have some code that searches file names, sorts them on a field in the name, and returns the top value.

Code:

# list the contents, strip off the path, and sort on field 3 as a number, returns top value
TOP_OUTCOME=$(ls  './'$SET_F'/'$FOLD'/'$FOLD'_anneal/'$C_PARAMS'/'$A_SET'/'*'out.txt' | \
              awk 'BEGIN {FS="/"} {print $7}' | \
              sort -t_ -k 3 -n | \
              head -n 1)

The file names look like,

Code:

V_se_69.36_E1_f5_r09_1-ON-0.25_S3C_v1_36.35.1.out.txt
V_se_69.62_E195_f5_r17_195-ON-0.25_S3C_v1_36.35.1.out.txt
V_se_70.14_E254_f5_r06_254-ON-0.25_S3C_v1_36.35.1.out.txt
V_se_70.41_E230_f5_r08_230-ON-0.25_S3C_v1_36.35.1.out.txt
V_se_71.00_E150_f5_r42_150-ON-0.25_S3C_v1_36.35.1.out.txt

The code sorts on the real in the third underscore delimited field. There are circumstances where I may not actually want the top value, but I need to read the file name to find out. In this case, since the value of the 4th field is E1 (the 1 is the important thing), I would want to look at the next value.

I guess I could collect the top 10 or so and loop though the file names, but I was also considering something like grabbing the top value as above, testing a filed (I will want the top value 99% of the time so most will pass the test), and then grabbing the 2nd value in the event of a fail, testing that name, etc.

It seems like grabbing a bunch,

Code:

TOP_OUTCOME=$(ls  './'$SET_F'/'$FOLD'/'$FOLD'_anneal/'$C_PARAMS'/'$A_SET'/'*'out.txt' | \
              awk 'BEGIN {FS="/"} {print $7}' | \
              sort -t_ -k 3 -n | \
              head -n 10)

and testing fields in a while loop would be the most efficient method, but I thought I would check. There are times where I would need to use tail instead of head with the same code, so that complicates things some I guess.

LMHmedchem

LMHmedchem

View Public Profile for LMHmedchem

Find all posts by LMHmedchem

09-12-2014

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

How about

Code:

TOP_OUTCOME=$(cd './'$SET_F'/'$FOLD'/'$FOLD'_anneal/'$C_PARAMS'/'$A_SET'/'; ls *'out.txt' |sort -t_ -k3,3n | awk -F_ '$4 == "E1" {next} {print; exit}'

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

09-12-2014

Moderator

3,843, 841

Join Date: Jun 2007

Last Activity: 29 June 2020, 12:30 PM EDT

Location: Lancashire, UK

Posts: 3,843

Thanks Given: 2,004

Thanked 841 Times in 727 Posts

If you get the list in a file, you can use sed to get a particular line:-

Code:

line=10
sed -n ${line}p file

I'm not sure if this is the best way or even truly supported, but it works for me. The curly brackets (are they braces?) are critical to keep the variable name clear from the required trailing p

I hope that this helps,

Regards,
Robin

This User Gave Thanks to rbatte1 For This Post:

rbatte1

View Public Profile for rbatte1

Visit rbatte1's homepage!

Find all posts by rbatte1

09-12-2014

Registered User

362, 16

Join Date: Mar 2010

Last Activity: 3 March 2020, 10:38 PM EST

Location: Boston

Posts: 362

Thanks Given: 193

Thanked 16 Times in 15 Posts

Quote:

Originally Posted by RudiC

How about

Code:

TOP_OUTCOME=$(cd './'$SET_F'/'$FOLD'/'$FOLD'_anneal/'$C_PARAMS'/'$A_SET'/'; ls *'out.txt' |sort -t_ -k3,3n | awk -F_ '$4 == "E1" {next} {print; exit}'

Is this iterative, meaning if my files names look like,

Code:

S_se_41.30_E1_f5_r34_1-ON-0.25_S3B_v1_36.35.1.out.txt
S_se_59.22_E1_f5_r05_1-ON-0.25_S3B_v1_36.35.1.out.txt
S_se_61.42_E1375_f5_r47_1375-ON-0.25_S3B_v1_36.35.1.out.txt

is this going to find the third file since the first two are E1?

In the end, I will probably need to do something like <= E5, meaning I will have to separate the E from the integer so I can evaluate it numerically.

This code works,

Code:

MIN_TO_ACCEPT='5'
SORT_ORDER='head'
NUMBER_TO_PROCESS='10'

# get list of $NUMBER_TO_PROCESS best outcomes
TOP_OUTCOMES=($(ls  './'$SET_F'/'$FOLD'/'$FOLD'_anneal/'$C_PARAMS_F'/'$A_SET'/'*'out.txt' | \
                           awk 'BEGIN {FS="/"} {print $7}' | \
                           sort -t_ -k 3 -n | \
                           $SORT_ORDER -n $NUMBER_TO_PROCESS))

# reset (code is in a loop)
unset OUTCOME_FIELDS
OPTIMUM_EPOCH=0
COUNTER=0

# keep looping until a value gt $MIN_TO_ACCEPT is found
while [ "$OPTIMUM_EPOCH" -lt "$MIN_TO_ACCEPT" ]
do
   # store file name
   TOP_OUTCOME=${TOP_OUTCOMES[$COUNTER]}
   # parse OUTCOME_FILE name to find optimum at $3
   IFS='_' read -a  OUTCOME_FIELDS <<< "$TOP_OUTCOME"
   OPTIMUM=${OUTCOME_FIELDS[3]}
   #remove leading E
   OPTIMUM=${OPTIMUM_EPOCH#E}
   RAND_SET=${OUTCOME_FIELDS[5]}
   echo "RAND_SET" $RAND_SET "OPTIMUM" $OPTIMUM
   (( COUNTER++ ))
   # if counter gets past number to process, quit with error
   if [ "$COUNTER" -eq "$NUMBER_TO_PROCESS" ]; then
      echo "no values greater than" $MIN_EPOCHS "were found for" $SET $ANNEALING_SET $FOLD
      exit
   fi
done

echo "SET" $SET "FOLD" $FOLD "TOP_OUTCOME"
echo $TOP_OUTCOME

This is relatively long hand and can be cleaned up somewhat but works fine for head sorting. Since most of the time I will just want the first file, it does not seem like an efficient solution.

Also, when I use tail instead of head, it seems like I get the top 10 files, but they are in reverse order, meaning that the 10th best outcome is processed first. Am I not using tail correctly?

LMHmedchem

LMHmedchem

View Public Profile for LMHmedchem

Find all posts by LMHmedchem

09-13-2014

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Quote:

Originally Posted by LMHmedchem

Is this iterative, meaning if my files names look like,

Code:

S_se_41.30_E1_f5_r34_1-ON-0.25_S3B_v1_36.35.1.out.txt
S_se_59.22_E1_f5_r05_1-ON-0.25_S3B_v1_36.35.1.out.txt
S_se_61.42_E1375_f5_r47_1375-ON-0.25_S3B_v1_36.35.1.out.txt

is this going to find the third file since the first two are E1?

Yes . Why don't you just try it?

Quote:

In the end, I will probably need to do something like <= E5, meaning I will have to separate the E from the integer so I can evaluate it numerically.

make the awk part read

Code:

awk -F_ '0+substr($4,2) <= 5 {next} {print; exit}'

RudiC

View Public Profile for RudiC

Find all posts by RudiC

Shell Programming and Scripting

Find the nth value in a list

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Replace a value of Nth field of nth row

Discussion started by: deepakwins

2. Shell Programming and Scripting

To find nth position of character in string

Discussion started by: Jonty Immortal

3. Shell Programming and Scripting

[Solved] Find and replace till nth occurence of a special character

Discussion started by: satyaatcgi

4. Shell Programming and Scripting

Find for line with not null values at nth place in pipe delimited file

Discussion started by: manikms

5. Shell Programming and Scripting

Calculating average for every Nth line in the Nth column

Discussion started by: ncwxpanther

6. Shell Programming and Scripting

Using AWK to find top Nth values in Nth column

Discussion started by: ncwxpanther

7. Shell Programming and Scripting

how to find third(nth) word in all line from a file

Discussion started by: bangarukannan

8. Shell Programming and Scripting

find string nth occurrence in file and print line number

Discussion started by: tmalik79

9. Shell Programming and Scripting

how to find the nth field value in delimiter file in unix using awk

Discussion started by: Jairaj

10. UNIX for Dummies Questions & Answers

To find the Nth Occurence of Search String

Discussion started by: mac4rfree