Bash to calculate average of all files in directory and output by part of filename


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Bash to calculate average of all files in directory and output by part of filename
# 1  
Old 10-04-2016
Bash to calculate average of all files in directory and output by part of filename

I am trying to use awk to calculate the average of all lines in $2 for every file in a directory. The below bash seems to do that, but I cannot figure out how to capture the string before the _ as the output file name and have it be tab-delimeted. Thank you Smilie.


Filenames in /home/cmccabe/Desktop/20x/idp
Code:
NA00449_base_counts_allidp.bed_IDP20x.txt
NA02782_base_counts_allidp.bed_IDP20x.txt

Bash
Code:
for f in /home/cmccabe/Desktop/20x/idp/*.txt ; do
     bname=$(basename $f)
     pref=${bname%%.txt}
     awk '{ sum += $2 } END { if (NR > 0) print sum / NR }' $f /home/cmccabe/Desktop/NGS/bed/bedtools/IDP_total_target_length_by_panel/IDP_unix_trim_total_target_length.bed > /home/cmccabe/Desktop/20x/idp/${pref}_average.txt
done

The data files are too large to attach but basically the average is being calculated currently, as below:

current output
Code:
NA00449_base_counts_allidp.bed_IDP_average.txt 98.5648

desired output (same data in it, just only the filename is different)
Code:
NA00449_average.txt     98.5648

# 2  
Old 10-04-2016
Hi,

Can you please consider this as a starting point to get what you need?

Code:
A=NA00449_base_counts_allidp.bed_IDP_average.txt
echo ${A%%_*}_${A##*_}

gives output:
Quote:
NA00449_average.txt
or in awk
Code:
echo "NA00449_base_counts_allidp.bed_IDP_average.txt" | awk -F_ '{print $1FS$NF}'


Last edited by greet_sed; 10-04-2016 at 07:03 PM.. Reason: adding awk solution
This User Gave Thanks to greet_sed For This Post:
# 3  
Old 10-05-2016
Thank you for the suggestion it lead me to the below which produces the desired result:

Code:
for f in /home/cmccabe/Desktop/20x/percent/*.txt ; do
     bname=$(basename $f)
     pref=${bname%%_base_*.txt}
     awk '{ sum += $2 } END { if (NR > 0) print sum / NR }' $f /home/cmccabe/Desktop/NGS/bed/bedtools/IDP_total_target_length_by_panel/IDP_unix_trim_total_target_length.bed > /home/cmccabe/Desktop/20x/coverage/${pref}_average.txt
done

Thank you for your help Smilie.
# 4  
Old 10-05-2016
Quote:
Originally Posted by cmccabe
Thank you for the suggestion it lead me to the below which produces the desired result:

Code:
for f in /home/cmccabe/Desktop/20x/percent/*.txt ; do
     bname=$(basename $f)
     pref=${bname%%_base_*.txt}
     awk '{ sum += $2 } END { if (NR > 0) print sum / NR }' $f /home/cmccabe/Desktop/NGS/bed/bedtools/IDP_total_target_length_by_panel/IDP_unix_trim_total_target_length.bed > /home/cmccabe/Desktop/20x/coverage/${pref}_average.txt
done

Thank you for your help Smilie.
Note that instead of starting a subshell for the command substitution and invoking the basename utility for every file you process, you can change:
Code:
     bname=$(basename $f)

to:
Code:
     bname=${f##*/}

to make it more efficient and a little bit faster while getting exactly the same results.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Calculate the average per block.

My old school way is a one liner. And will search for average from SAR, to get the data receive rate. But, I dont think it is practical or accurate,. Because it calculates to off peak hours. I am planning to change it. My cron runs every 30 mins. When my cron runs, and my time is 14:47pm,, it will... (1 Reply)
Discussion started by: invinzin21
1 Replies

2. Shell Programming and Scripting

Bash to rename files repeats previous filename in directory

In the below bash processes substitution, if there are 3 files in a directory /home/cmccabe/medex.logs/analysis.log, the filename variable is set to where these files are located. The code does execute, the problem is that if there is a renamed file in the output directory below, it gets... (0 Replies)
Discussion started by: cmccabe
0 Replies

3. Shell Programming and Scripting

Add part of directory name to filename

Hello, I need to add a part of folder name to the files inside it. For instance the file is HMCBackup_20150430.155027.tgz and it is under directory /nim/dr/HMCBackup/cops22 I need to add cops22 to the file name so as it would be cops22_HMCBackup_20150430.155027.tgz Any help in doing... (10 Replies)
Discussion started by: hasn318
10 Replies

4. Shell Programming and Scripting

Using bash to separate files files based on parts of a filename

Hey guys, Sorry for the basic question but I have a lot of files that I want to separate into groups based on filenames which I can then cat together. Eg I have: (a_b_c.txt) WB34_2_SLA8.txt WB34_1_SLA8.txt WB34_1_DB10.txt WB34_2_DB10.txt WB34_1_SLA8.txt WB34_2_SLA8.txt 77_1_SLA8.txt... (1 Reply)
Discussion started by: Breentax
1 Replies

5. Shell Programming and Scripting

Calculate Average AWK

I want to calculate the average line by line of some files with several lines on them, the files are identical, just want to average the 3rd columns of those files.:wall: Example file: File 1 001 0.046 0.667267 001 0.047 0.672028 001 0.048 0.656025 001 0.049 ... (2 Replies)
Discussion started by: AriasFco
2 Replies

6. Shell Programming and Scripting

Bash: How to get part of the directory name?

How can I write a script that determines the directory the user is in, and displays that path up until a particular point? Specifically, I need to find the text "packages" in the directory name, then I need to capture that directory and the one below it. For example, if the user is in the... (5 Replies)
Discussion started by: RickS
5 Replies

7. Shell Programming and Scripting

How to calculate the entropy of a single directory that contains many files

Hello, I'm new member of shell scripting and i face some difficulties. To begin, i try to write an algorithm that calculate from one directory containing nfdump files (288) the entropy of one day 24hours. Each of the file is 5 min interval (nfdump -r nfcapd.200908250000 -s srcip) 1st (nfdump... (0 Replies)
Discussion started by: draxmas
0 Replies

8. Programming

calculate average

I have a file which is 2 3 4 5 6 6 so i am writing program in C to calculate mean.. #include<stdio.h> #include<string.h> #include <math.h> double CALL mean(int n , double x) main (int argc, char **argv) { char Buf,SEQ; int i; double result = 0; FILE *fp; (3 Replies)
Discussion started by: cdfd123
3 Replies

9. UNIX for Dummies Questions & Answers

Report of duplicate files based on part of the filename

I have the files logged in the file system with names in the format of : filename_ordernumber_date_time eg: file_1_12012007_1101.txt file_2_12022007_1101.txt file_1_12032007_1101.txt I need to find out all the files that are logged multiple times with same order number. In the above eg, I... (1 Reply)
Discussion started by: sudheshnaiyer
1 Replies

10. UNIX for Dummies Questions & Answers

calculate average of column 2

Hi I have fakebook.csv as following: F1(current date) F2(popularity) F3(name of book) F4(release date of book) 2006-06-21,6860,"Harry Potter",2006-12-31 2006-06-22,,"Harry Potter",2006-12-31 2006-06-23,7120,"Harry Potter",2006-12-31 2006-06-24,,"Harry Potter",2006-12-31... (0 Replies)
Discussion started by: onthetopo
0 Replies
Login or Register to Ask a Question