Pass an array to awk to sequentially look for a list of items in a file
Hello,
I need to collect some statistical results from a series of files that are being generated by other software. The files are tab delimited. There are 4 different sets of statistics in each file where there is a line indicating what the statistic set is, followed by 5 lines of values. It looks like this,
There is more data on each line, but that is not an issue. There can also be up to 4 sets that need to be retrieved.
What I would generally do here is something like,
This would find the line containing whatever was passed in as $current_stat and start saving lines at the next line. After the 5th line has been saved, the saved array is printed and the array, save flag, and counters are reset. Of course, if we are only looking for one set of data to print, the reinitalization is not necessary and we could exit there instead.
My question is about the best way to capture several sets in one pass through the file. My thought was to put the labels for what I wanted to find in an array in bash and then call awk with the array instead of a single variable. I would then look for each array element in succession until all had been found. I thought that would look like,
This was intended to start st_arr at 0 and look for whatever value was there. This doesn't work and gives an error, attempt to use scalar `st_arr' as an array. I think I have the syntax correct for passing an array to awk but it doesn't see to have worked. Do I need to translate the bash array into an awk array on the BEGIN line? Is this just not the right way to do this?
I would probably just save everything I captured in a single array and print it at the end instead of printing after each set is recovered. Even if the above works, I'm not sure how to avoid an array boundary error with st_arr[] since I think that the above would increment it past its size.
Thanks,
LMHmedchem
Last edited by LMHmedchem; 04-08-2019 at 02:27 PM..
What's the end objective?
For each label/stat combo, print out the sum (min/max/avg/???) of each stat?
What're you trying to do?
Dozens of these files are generated and I need to pull out some of the statistics and put them into a form where I can look at all of the results in one file.
There will probably be a header row that will be entered when the logfile is created and the a new row will be appended for each file processed.
I could do this with a separate call to awk for each set I need (train, test, etc), but that doesn't make any sense. I could pass in a separate label variable fore each set I need, but that starts to look very messy after a while. It seems like passing in an array makes the most sense. It looks like awk is treating the passed in array as a single string so I was wondering if the problem is that I need to parse what awk thinks is a string into an array, or if I am incorrect in the way I am passing in the array.
If this is just a dumb idea for a solution, I would like to know about that as well.
something to start with and improve upon...
assuming all the statistics are displayed in the same order in a file AND across the files: awk -f lmh.awk myFiles where lmh.awk is:
something to start with and improve upon...
assuming all the statistics are displayed in the same order in a file AND across the files: awk -f lmh.awk myFiles where lmh.awk is:
Thank you for the reply but I don't see in the above how my stats will be found in the huge stats output file when there is no notation of how to find what I am looking for. The stats I need are the second field of the first 5 lines following "train statistics", etc. I don't see how you can find what I am looking for without "train statistics" being in there somewhere. I could post an actual file if that would help.
This version works more or less. It prints the stats I need to the logfile.
I need to format the output a bit better and trap for if pos is larger that the size of label_array. I suspect that I also don't need both a_count and line_count.
In post #1, the labels in your sample input files were "train statistics" and "test statistics". In your latest code you have labels with underscores instead of spaces. You'll have to be sure your stats files and your labels match.
In post #3, you included a header line in your output; I don't see anything in your latest code that prints that header. And, unless each of your stats files contains all of the sets of statistics and includes them in the same order, what you have shown us will end up printing data for different sets of statistics under each other in the output with no indication of which set they came from. Do each of your stats files contain statistics from all of the possible sets of statistics and are each of those sets of statistics present in the same order in each stats file?
In your latest code you have:
which sets both your input file and your output file to be the same input parameter. I'm about 99% sure that isn't what you want.
The cat in your code isn't helping and seems to be working against you. I think you're going to want to end up with something more like:
Do you want your output fields to be tab separated, or do you want your output fields to be in aligned columns? Note that since your field headers vary in width from 6 characters (e.g., "test_n" to more than 9 characters (e.g. " train_MdAE" and your statistics data all fit in less than 8 characters, the two choices are mutually exclusive. (I.e., you can't have both.)
This User Gave Thanks to Don Cragun For This Post:
Making some assumptions on your data structure (until Don Cragun's questions have been fully and finally answered), and making up my own sample data files, I have come up with
Give it a try and report back.
Hello,
I have a src code file where I need to uncomment many lines.
The lines I need to uncomment look like,
C CALL l_r(DESNAME,DESOUT, 'Gmax', ESH(10), NO_APP, JJ)
The comment is the "C" in the first column. This needs to be deleted so that there are 6 spaces preceding "CALL".... (7 Replies)
Hello,
I have some tab delimited text data,
file: final_temp1
aname val
NAME;r'(1,) 3.28584
r'(2,)<tab>
NAME;r'(3,) 6.13003
NAME;r'(4,) 4.18037
r'(5,)<tab>
You can see that the data is incomplete in some cases. There is a trailing tab after the first column for each incomplete row. I... (2 Replies)
I need to create a shell script to delete multiple items (Strings) at a time from a file.
I need to iterate through a list of strings.
My plan is to create an array and then iterate through the array.
My code is not working
#!/bin/bash -x
declare -a array=(one, two, three, four)... (5 Replies)
There are two parts to this. In the first part I need to read a list of files from a directory and split it into 4 arrays. I have done that with the following code,
# collect list of file names
STATS_INPUT_FILENAMES=($(ls './'$SET'/'$FOLD'/'*'in.txt'))
# get number of files... (8 Replies)
hi,
i want to pass an array parameters to a sftp script so that i can transfer each file in the array to the remote server by connecting only once to the sftp remote server.
i thought of using a variable that contains list of file names separated by a space and pass the variable to the sftp... (3 Replies)
Hi, all
suppose I have following myfile (delimited by tab)
aa bb
cc dd
ee ffand I have following awk command:
awk 'BEGIN{FS="\t"}{AwkArrayVar_1=$1;AwkArrayVar_2=$2};END{for(i=0; i<NR; i++) print i, AwkArrayVar_1, AwkArrayVar_2,}' myfileMy question is: how can I assign the awk array... (7 Replies)
OS=HP-UX ksh
The following works, except I want to include the <start> and <end> in the output.
awk -F '<start>' 'BEGIN{RS="<end>"; OFS="\n"; ORS=""} {print $2} somefile.log'
The following work in bash but not in ksh
sed -n '/^<start>/,/^<end>/{/LABEL$/!p}' somefile.log (4 Replies)
Hi All :),
I am very new to unix. I am requiring ur help in developing shell script for below problem.
I have to replace the second field of file with values of array sequentially where first field is ValidateKeepVar
<File>
UT-ExtractField 1 | &LogEntry &Keep(DatatoValidate)... (3 Replies)
Hi
I need to pass an array to Awk script from Shell. Can you please tell how to do it? How to pass this array add_ct_arr to an awk script or access it in awk?
i=1
while ;
do
add_ct_arr=$(echo ${adda_count} | awk -v i=$i -F" " '{print $i;}')
echo ${add_ct_arr}
... (1 Reply)