Pass an array to awk to sequentially look for a list of items in a file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Pass an array to awk to sequentially look for a list of items in a file
# 8  
Old 04-10-2019
Quote:
Originally Posted by Don Cragun
In post #1, the labels in your sample input files were "train statistics" and "test statistics". In your latest code you have labels with underscores instead of spaces. You'll have to be sure your stats files and your labels match.
I temporarily removed the spaces from both the test input files and the shell code to eliminate the space as a possible source of problems. I try to avoid having spaces within a single field of a delimited text file but it cannot always be avoided. I have it working now with the space.

Quote:
Originally Posted by Don Cragun
In post #3, you included a header line in your output; I don't see anything in your latest code that prints that header. And, unless each of your stats files contains all of the sets of statistics and includes them in the same order, what you have shown us will end up printing data for different sets of statistics under each other in the output with no indication of which set they came from. Do each of your stats files contain statistics from all of the possible sets of statistics and are each of those sets of statistics present in the same order in each stats file?
The header is created when the logfile is created. I am using this code as a function that gets called when each output file has been created. The code appends the statistics from the processed file to the existing logfile. Entries for all of the statistics exist in every output file in the same order. The values may all be 0.0 if a given set of statistics were not calculated, but the entry will be there.

Quote:
Originally Posted by Don Cragun
In your latest code you have:
Code:
# name of file being processed
STATS_FILE=$1
# file we are writing to
LOGFILE=$1

which sets both your input file and your output file to be the same input parameter. I'm about 99% sure that isn't what you want.
Yes, I have fixed that.

Quote:
Originally Posted by Don Cragun
The cat in your code isn't helping and seems to be working against you. I think you're going to want to end up with something more like:
Code:
... ... ...

# file we are writing to
LOGFILE=$1
# The rest of the operands are stats files we need to read.
shift

awk -v labels="$LABELS" '
    ... ... ...
' "$@" > "$LOGFILE"

I have typically used cat to pipe input to awk when reading input from a file. I have used something like you suggest but I didn't remember to this time. You are suggesting basically,

awk 'awk stuff' $INPUT_FILE > $OUTPUT_FILE

which is how I would use awk on the command line but for some reason don't in scripts.

Quote:
Originally Posted by Don Cragun
Do you want your output fields to be tab separated, or do you want your output fields to be in aligned columns? Note that since your field headers vary in width from 6 characters (e.g., "test_n" to more than 9 characters (e.g. " train_MdAE" and your statistics data all fit in less than 8 characters, the two choices are mutually exclusive. (I.e., you can't have both.)
The output will be tab delimited. I did aligned columns in my sample post because I think that tab can be hard to read in plain text.

This is what I have at the moment. This has been revised to reflect your suggestion.
Code:
#!/bin/sh

# retrieve statistics from a final stats output file and write to log
function extract_and_log_stats () {

   # path to stats file being processed
   STATS_FILE_TO_READ_Fc=$1
   # path to logfile where entry will be written
   OUTPUT_PATH_Fc=$2
   # name of stats file being processed (minus the path) to be entered in logfile
   STATS_FILE_Fc=$3

   # 4 sets of labels we are looking for
   LABELS_Fc='train statistics,test statistics,validate statistics,ival statistics'

   # process the stats file looking in turn for each value in LABELS
   awk -v var="$LABELS_Fc" \
       -v filename="$STATS_FILE_Fc" \
       -v OFS='\t' '                BEGIN { split(var,label_array,","); pos = 1 }
                                   F == 1 { line_array[++a_count] = $2; line_count++ }
                          line_count == 5 { F = 0; line_count = 0; pos++ }
                                 pos == 5 { printf "%s\t",filename;
                                            for(i=1; i<a_count; i++) printf "%s\t",line_array[i];
                                            printf "%s\n",line_array[a_count];
                                            exit }
                    $0 ~ label_array[pos] { F = 1; line_count = 0 }
                   ' $STATS_FILE_TO_READ_Fc >> $OUTPUT_PATH_Fc

}

This function is called with the path to the stats file being processed, the path to the logfile where the stats are written, and the name of the stats file being processed (to also enter to the log). The is called once for each stats file produces and appends an entry to an existing logfile.

This seems to work fine. As you pointed out, this expects the stats in LABELS to exist and be found in the same order. I have attached a sample of the stats files I am processing in case that is useful.

LMHmedchem
# 9  
Old 04-10-2019
OK then. It looks like RudiC guessed correctly on everything you were trying to do and on your stat file format. And, as long as you give it at least two stat files to process, it looks to me like his code produces the output you want. If you just give it one input file, however, it won't print any headers.

By moving some of this code into a function, as shown below, and processing command line arguments as I suggested before, you seem to get what you want:
Code:
#!/bin/bash

if [ $# -lt 2 ]
then	printf 'Usage: %s output_file stat_file...\n' "${0%%*/}" >&2
	exit 1
fi

# where to write the output
logfile=$1
# remaining operands are stat files to be read directly by awk...
shift

awk '
BEGIN           {LAB="train statistics|test statistics|validate statistics|ival statistics"
                }

function prec()	{if(! HDDONE)	{printf "filename"
                                 for (i=1; i<=CNT; i++) printf OFS"%s", HD[i]
                                 printf ORS
                                 HDDONE = 1
                                }
                 printf "%s", FN
                 for (i=1; i<=CNT; i++) printf OFS"%s", VAL[HD[i]]
                 printf ORS
                 split ("", VAL)
                }

FNR == 1        {if (NR != 1) prec()
		 FN = FILENAME
		}

$0 ~ LAB        {PH = $1
                 for (i=1; i<=5; i++)   {getline
                                         IX = PH "_" $1
                                         VAL[IX] = $2
                                         if (! HDDONE) HD[++CNT] = IX 
                                        }
                } 

END             {prec()
                }

' OFS="\t" "$@" > "$logfile"

And, if your system has the column utility, you can use it to print the log file this creates (which contains <tab> separated fields) into aligned text using the command:
Code:
column -t logfile.txt

where logfile.txt is the name of the output you supplied to the above script as its first operand. And if we do that with the sample file you uploaded with post #8 after invoking the script above with:
Code:
script_name logfile.txt S-mae_0.3810_V-mae_0.4956_all_B30_E800_EC503_S1v1_30.15.1.txt

and then run:
Code:
column -t logfile.txt

the output we get is:
Code:
filename                                                       train_r2  train_MeAE  train_MdAE  train_SE  train_n  test_r2  test_MeAE  test_MdAE  test_SE  test_n  validate_r2  validate_MeAE  validate_MdAE  validate_SE  validate_n  ival_r2  ival_MeAE  ival_MdAE  ival_SE  ival_n
S-mae_0.3810_V-mae_0.4956_all_B30_E800_EC503_S1v1_30.15.1.txt  0.8320    0.3215      0.2784      0.3068    400      0.7183   0.3810     0.2922     0.4129   400     0.5309       0.4956         0.4186         0.5013       400         0.0000   0.0000     0.0000     0.0000   0

Note that if you call this script with more than one stat file, it still only invokes awk once but will process all of the stat files you feed it.

Last edited by Don Cragun; 04-10-2019 at 04:40 PM.. Reason: Fix typos: s/file/stat file/; s/me/me like/; s/to your the/to the/
This User Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Script to process a list of items and uncomment lines with that item in a second file

Hello, I have a src code file where I need to uncomment many lines. The lines I need to uncomment look like, C CALL l_r(DESNAME,DESOUT, 'Gmax', ESH(10), NO_APP, JJ) The comment is the "C" in the first column. This needs to be deleted so that there are 6 spaces preceding "CALL".... (7 Replies)
Discussion started by: LMHmedchem
7 Replies

2. Shell Programming and Scripting

Read a lis, find items in a file from the list, change each item

Hello, I have some tab delimited text data, file: final_temp1 aname val NAME;r'(1,) 3.28584 r'(2,)<tab> NAME;r'(3,) 6.13003 NAME;r'(4,) 4.18037 r'(5,)<tab> You can see that the data is incomplete in some cases. There is a trailing tab after the first column for each incomplete row. I... (2 Replies)
Discussion started by: LMHmedchem
2 Replies

3. Shell Programming and Scripting

sed to delete items in an array from a file

I need to create a shell script to delete multiple items (Strings) at a time from a file. I need to iterate through a list of strings. My plan is to create an array and then iterate through the array. My code is not working #!/bin/bash -x declare -a array=(one, two, three, four)... (5 Replies)
Discussion started by: bash_in_my_head
5 Replies

4. Shell Programming and Scripting

Split list of files into an array and pass to function

There are two parts to this. In the first part I need to read a list of files from a directory and split it into 4 arrays. I have done that with the following code, # collect list of file names STATS_INPUT_FILENAMES=($(ls './'$SET'/'$FOLD'/'*'in.txt')) # get number of files... (8 Replies)
Discussion started by: LMHmedchem
8 Replies

5. Shell Programming and Scripting

How to pass an array containing file names to a sftp script?

hi, i want to pass an array parameters to a sftp script so that i can transfer each file in the array to the remote server by connecting only once to the sftp remote server. i thought of using a variable that contains list of file names separated by a space and pass the variable to the sftp... (3 Replies)
Discussion started by: Little
3 Replies

6. Shell Programming and Scripting

[Solved] awk command to read sequentially from a file until last record

Hello, I have a file that looks like this: Generated geometry (...some special descriptor) 1 0.56784 1.45783 -0.87965 8 1.29873 -0.8767 1.098789 ... ... ... ... Generated geometry (....come special descriptor) ... .... ... ... ... ... ... ... and... (4 Replies)
Discussion started by: jaldo0805
4 Replies

7. Shell Programming and Scripting

Pass awk array variable to shell

Hi, all suppose I have following myfile (delimited by tab) aa bb cc dd ee ffand I have following awk command: awk 'BEGIN{FS="\t"}{AwkArrayVar_1=$1;AwkArrayVar_2=$2};END{for(i=0; i<NR; i++) print i, AwkArrayVar_1, AwkArrayVar_2,}' myfileMy question is: how can I assign the awk array... (7 Replies)
Discussion started by: littlewenwen
7 Replies

8. Shell Programming and Scripting

awk between items including items

OS=HP-UX ksh The following works, except I want to include the <start> and <end> in the output. awk -F '<start>' 'BEGIN{RS="<end>"; OFS="\n"; ORS=""} {print $2} somefile.log' The following work in bash but not in ksh sed -n '/^<start>/,/^<end>/{/LABEL$/!p}' somefile.log (4 Replies)
Discussion started by: Ikon
4 Replies

9. Shell Programming and Scripting

hw to insert array values sequentially in a file

Hi All :), I am very new to unix. I am requiring ur help in developing shell script for below problem. I have to replace the second field of file with values of array sequentially where first field is ValidateKeepVar <File> UT-ExtractField 1 | &LogEntry &Keep(DatatoValidate)... (3 Replies)
Discussion started by: rohiiit.sharma
3 Replies

10. Shell Programming and Scripting

Pass array variabel to awk from shell

Hi I need to pass an array to Awk script from Shell. Can you please tell how to do it? How to pass this array add_ct_arr to an awk script or access it in awk? i=1 while ; do add_ct_arr=$(echo ${adda_count} | awk -v i=$i -F" " '{print $i;}') echo ${add_ct_arr} ... (1 Reply)
Discussion started by: appsguy616
1 Replies
Login or Register to Ask a Question