Find the nth value in a list


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Find the nth value in a list
# 1  
Old 09-12-2014
Find the nth value in a list

Hello,

I have some code that searches file names, sorts them on a field in the name, and returns the top value.
Code:
# list the contents, strip off the path, and sort on field 3 as a number, returns top value
TOP_OUTCOME=$(ls  './'$SET_F'/'$FOLD'/'$FOLD'_anneal/'$C_PARAMS'/'$A_SET'/'*'out.txt' | \
              awk 'BEGIN {FS="/"} {print $7}' | \
              sort -t_ -k 3 -n | \
              head -n 1)

The file names look like,
Code:
V_se_69.36_E1_f5_r09_1-ON-0.25_S3C_v1_36.35.1.out.txt
V_se_69.62_E195_f5_r17_195-ON-0.25_S3C_v1_36.35.1.out.txt
V_se_70.14_E254_f5_r06_254-ON-0.25_S3C_v1_36.35.1.out.txt
V_se_70.41_E230_f5_r08_230-ON-0.25_S3C_v1_36.35.1.out.txt
V_se_71.00_E150_f5_r42_150-ON-0.25_S3C_v1_36.35.1.out.txt

The code sorts on the real in the third underscore delimited field. There are circumstances where I may not actually want the top value, but I need to read the file name to find out. In this case, since the value of the 4th field is E1 (the 1 is the important thing), I would want to look at the next value.

I guess I could collect the top 10 or so and loop though the file names, but I was also considering something like grabbing the top value as above, testing a filed (I will want the top value 99% of the time so most will pass the test), and then grabbing the 2nd value in the event of a fail, testing that name, etc.

It seems like grabbing a bunch,
Code:
TOP_OUTCOME=$(ls  './'$SET_F'/'$FOLD'/'$FOLD'_anneal/'$C_PARAMS'/'$A_SET'/'*'out.txt' | \
              awk 'BEGIN {FS="/"} {print $7}' | \
              sort -t_ -k 3 -n | \
              head -n 10)

and testing fields in a while loop would be the most efficient method, but I thought I would check. There are times where I would need to use tail instead of head with the same code, so that complicates things some I guess.

LMHmedchem
# 2  
Old 09-12-2014
How about
Code:
TOP_OUTCOME=$(cd './'$SET_F'/'$FOLD'/'$FOLD'_anneal/'$C_PARAMS'/'$A_SET'/'; ls *'out.txt' |sort -t_ -k3,3n | awk -F_ '$4 == "E1" {next} {print; exit}'

?
This User Gave Thanks to RudiC For This Post:
# 3  
Old 09-12-2014
If you get the list in a file, you can use sed to get a particular line:-
Code:
line=10
sed -n ${line}p file

I'm not sure if this is the best way or even truly supported, but it works for me. The curly brackets (are they braces?) are critical to keep the variable name clear from the required trailing p


I hope that this helps,

Regards,
Robin
This User Gave Thanks to rbatte1 For This Post:
# 4  
Old 09-12-2014
Quote:
Originally Posted by RudiC
How about
Code:
TOP_OUTCOME=$(cd './'$SET_F'/'$FOLD'/'$FOLD'_anneal/'$C_PARAMS'/'$A_SET'/'; ls *'out.txt' |sort -t_ -k3,3n | awk -F_ '$4 == "E1" {next} {print; exit}'

?
Is this iterative, meaning if my files names look like,
Code:
S_se_41.30_E1_f5_r34_1-ON-0.25_S3B_v1_36.35.1.out.txt
S_se_59.22_E1_f5_r05_1-ON-0.25_S3B_v1_36.35.1.out.txt
S_se_61.42_E1375_f5_r47_1375-ON-0.25_S3B_v1_36.35.1.out.txt

is this going to find the third file since the first two are E1?

In the end, I will probably need to do something like <= E5, meaning I will have to separate the E from the integer so I can evaluate it numerically.

This code works,
Code:
MIN_TO_ACCEPT='5'
SORT_ORDER='head'
NUMBER_TO_PROCESS='10'

# get list of $NUMBER_TO_PROCESS best outcomes
TOP_OUTCOMES=($(ls  './'$SET_F'/'$FOLD'/'$FOLD'_anneal/'$C_PARAMS_F'/'$A_SET'/'*'out.txt' | \
                           awk 'BEGIN {FS="/"} {print $7}' | \
                           sort -t_ -k 3 -n | \
                           $SORT_ORDER -n $NUMBER_TO_PROCESS))

# reset (code is in a loop)
unset OUTCOME_FIELDS
OPTIMUM_EPOCH=0
COUNTER=0

# keep looping until a value gt $MIN_TO_ACCEPT is found
while [ "$OPTIMUM_EPOCH" -lt "$MIN_TO_ACCEPT" ]
do
   # store file name
   TOP_OUTCOME=${TOP_OUTCOMES[$COUNTER]}
   # parse OUTCOME_FILE name to find optimum at $3
   IFS='_' read -a  OUTCOME_FIELDS <<< "$TOP_OUTCOME"
   OPTIMUM=${OUTCOME_FIELDS[3]}
   #remove leading E
   OPTIMUM=${OPTIMUM_EPOCH#E}
   RAND_SET=${OUTCOME_FIELDS[5]}
   echo "RAND_SET" $RAND_SET "OPTIMUM" $OPTIMUM
   (( COUNTER++ ))
   # if counter gets past number to process, quit with error
   if [ "$COUNTER" -eq "$NUMBER_TO_PROCESS" ]; then
      echo "no values greater than" $MIN_EPOCHS "were found for" $SET $ANNEALING_SET $FOLD
      exit
   fi
done

echo "SET" $SET "FOLD" $FOLD "TOP_OUTCOME"
echo $TOP_OUTCOME

This is relatively long hand and can be cleaned up somewhat but works fine for head sorting. Since most of the time I will just want the first file, it does not seem like an efficient solution.

Also, when I use tail instead of head, it seems like I get the top 10 files, but they are in reverse order, meaning that the 10th best outcome is processed first. Am I not using tail correctly?

LMHmedchem
# 5  
Old 09-13-2014
Quote:
Originally Posted by LMHmedchem
Is this iterative, meaning if my files names look like,
Code:
S_se_41.30_E1_f5_r34_1-ON-0.25_S3B_v1_36.35.1.out.txt
S_se_59.22_E1_f5_r05_1-ON-0.25_S3B_v1_36.35.1.out.txt
S_se_61.42_E1375_f5_r47_1375-ON-0.25_S3B_v1_36.35.1.out.txt

is this going to find the third file since the first two are E1?
Yes . Why don't you just try it?

Quote:
In the end, I will probably need to do something like <= E5, meaning I will have to separate the E from the integer so I can evaluate it numerically.
make the awk part read
Code:
awk -F_ '0+substr($4,2) <= 5 {next} {print; exit}'

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Replace a value of Nth field of nth row

Using Awk, how can I achieve the following? I have set of record numbers, for which, I have to replace the nth field with some values, say spaces. Eg: Set of Records : 4,9,10,55,89,etc I have to change the 8th field of all the above set of records to spaces (10 spaces). Its a delimited... (1 Reply)
Discussion started by: deepakwins
1 Replies

2. Shell Programming and Scripting

To find nth position of character in string

Hi guyz i want to know nth position of character in string. For ex. var="UK,TK,HK,IND,AUS" now if we see 1st occurance of , is at 3 position, 2nd at 6,..4th at 13 position. 1st position we can find through INDEX, but what about 2nd,3rd and 4th or may be upto nth position. ? In oracle we had... (2 Replies)
Discussion started by: Jonty Immortal
2 Replies

3. Shell Programming and Scripting

[Solved] Find and replace till nth occurence of a special character

Hi, I have a requirement to search for a pattern in each line in a file and remove the in between words till the 3rd occurrence of double quote ("). Ex: CREATE TABLE "SCHEMANAME"."AMS_LTV_STATUS" (Note: "SCHEMANAME" may changes for different schemas. Its not a fixed value) I need to... (2 Replies)
Discussion started by: satyaatcgi
2 Replies

4. Shell Programming and Scripting

Find for line with not null values at nth place in pipe delimited file

Hi, I am trying to find the lines in a pipe delimited file where 11th column has not null values. Any help is appreciated. Need help asap please. thanks in advance. (3 Replies)
Discussion started by: manikms
3 Replies

5. Shell Programming and Scripting

Calculating average for every Nth line in the Nth column

Is there an awk script that can easily perform the following operation? I have a data file that is in the format of 1944-12,5.6 1945-01,9.8 1945-02,6.7 1945-03,9.3 1945-04,5.9 1945-05,0.7 1945-06,0.0 1945-07,0.0 1945-08,0.0 1945-09,0.0 1945-10,0.2 1945-11,10.5 1945-12,22.3... (3 Replies)
Discussion started by: ncwxpanther
3 Replies

6. Shell Programming and Scripting

Using AWK to find top Nth values in Nth column

I have an awk script to find the maximum value of the 2nd column of a 2 column datafile, but I need to find the top 5 maximum values of the 2nd column. Here is the script that works for the maximum value. awk 'BEGIN { subjectmax=$1 ; max=0} $2 >= max {subjectmax=$1 ; max=$2} END {print... (3 Replies)
Discussion started by: ncwxpanther
3 Replies

7. Shell Programming and Scripting

how to find third(nth) word in all line from a file

For example i'm having the below contents in a file: expr is great when you want to split a string into just two parts. The .* also makes expr good for skipping a variable number of words when you don't know how many words a string will have. But expr is lousy for getting, say, the fourth word... (2 Replies)
Discussion started by: bangarukannan
2 Replies

8. Shell Programming and Scripting

find string nth occurrence in file and print line number

Hi I have requirement to find nth occurrence in a file and capture data from with in lines (between lines) Data in File. <QUOTE> <SESSION> <ATTRIBUTE NAME='Parameter Filename' VALUE='file1.parm'/> <ATTRIBUTE NAME='Service Name' VALUE='None'/> </SESSION> <SESSION> <ATTRIBUTE... (6 Replies)
Discussion started by: tmalik79
6 Replies

9. Shell Programming and Scripting

how to find the nth field value in delimiter file in unix using awk

Hi All, I wanted to find 200th field value in delimiter file using awk.? awk '{print $200}' inputfile I am getting error message :- awk: The field 200 must be in the range 0 to 199. The source line number is 1. The error context is {print >>> $200 <<< } using... (4 Replies)
Discussion started by: Jairaj
4 Replies

10. UNIX for Dummies Questions & Answers

To find the Nth Occurence of Search String

Hi guys, I like to find the Line number of Nth Occurence of a Search string in a file. If possible, if it will land the cursor to that particualar line will be great. Cheers!! (3 Replies)
Discussion started by: mac4rfree
3 Replies
Login or Register to Ask a Question