Sponsored Content
Full Discussion: Speed up bash loop?
Top Forums Shell Programming and Scripting Speed up bash loop? Post 302960602 by cmccabe on Tuesday 17th of November 2015 01:45:31 PM
Old 11-17-2015
Speed up bash loop?

I am running the below bash loop on all the files of a specific type (highlighted in bold) in a directory. There are 4 awk commands that use the input files to search another and look for a match. The input files range from 27 - 259 and are a list of names. The file that is searched is 11,137,660 lines. The loop does run, however, it takes ~20 hours to complete on a computer with 64GB and a xeon 8 core processor. Is this normal and can it be made faster (more efficient)? Thank you Smilie.

Code:
for f in /home/cmccabe/Desktop/HiQ/*base_counts.txt ; do
     bname=`basename $f`
     pref=${bname%%.txt}
     awk -f /home/cmccabe/Desktop/match.awk /home/cmccabe/Desktop/panels/PCD_unix_corrected.bed $f > /home/cmccabe/Desktop/HiQ/${pref}_PCD_coverage.
     awk -f /home/cmccabe/Desktop/match.awk /home/cmccabe/Desktop/panels/BMF_unix_corrected.bed $f > /home/cmccabe/Desktop/HiQ/${pref}_BMF_coverage.bed
     awk -f /home/cmccabe/Desktop/match.awk /home/cmccabe/Desktop/panels/PAH_unix_corrected.bed $f > /home/cmccabe/Desktop/HiQ/${pref}_PAH_coverage.bed
     awk -f /home/cmccabe/Desktop/match.awk /home/cmccabe/Desktop/panels/PID_unix_corrected.bed $f > /home/cmccabe/Desktop/HiQ/${pref}_PID_coverage.bed
done

awk
Code:
BEGIN {
  FS="[ \t|]*"
}
# Read search terms from file1 into 's'
FNR==NR {
    s[$0]
    next
}
{
    # Check if $5 matches one of the search terms
    for(i in s) {
        if($5 ~ i) {

            # Store first two fields for later usage
            a[$5]=$1
            b[$5]=$2

            # Add $9 to total of $9 per $5
            t[$5]+=$8
            # Increment count of occurences of $5
            c[$5]++

            next
        }
    }
}
END {

    # Calculate average and print output for all search terms
    # that has been found
    for( i in t ) {
        avg = t[i] / c[i]
        printf "%s:%s\t%s\t%s\n", a[i], b[i], i, avg | "sort -k3,3n"
    }
}

 

10 More Discussions You Might Find Interesting

1. Filesystems, Disks and Memory

dmidecode, RAM speed = "Current Speed: Unknown"

Hello, I have a Supermicro server with a P4SCI mother board running Debian Sarge 3.1. This is the "dmidecode" output related to RAM info: RAM speed information is incomplete.. "Current Speed: Unknown", is there anyway/soft to get the speed of installed RAM modules? thanks!! Regards :)... (0 Replies)
Discussion started by: Santi
0 Replies

2. Shell Programming and Scripting

bash and ksh: variable lost in loop in bash?

Hi, I use AIX (ksh) and Linux (bash) servers. I'm trying to do scripts to will run in both ksh and bash, and most of the time it works. But this time I don't get it in bash (I'm more familar in ksh). The goal of my script if to read a "config file" (like "ini" file), and make various report.... (2 Replies)
Discussion started by: estienne
2 Replies

3. Shell Programming and Scripting

any way to speed up calculations in bash script

hi i have a script that is taking the difference of multiple columns in a file from a value from a single row..so far i have a loop to do that.. all the data is floating point..fin has the difference between array1 and array2..array1 has 700 x 300= 210000 values and array2 has 700 values.. ... (11 Replies)
Discussion started by: npatwardhan
11 Replies

4. Shell Programming and Scripting

Using variables created sequentially in a loop while still inside of the loop [bash]

I'm trying to understand if it's possible to create a set of variables that are numbered based on another variable (using eval) in a loop, and then call on it before the loop ends. As an example I've written a script called question (The fist command is to show what is the contents of the... (2 Replies)
Discussion started by: DeCoTwc
2 Replies

5. Filesystems, Disks and Memory

data from blktrace: read speed V.S. write speed

I analysed disk performance with blktrace and get some data: read: 8,3 4 2141 2.882115217 3342 Q R 195732187 + 32 8,3 4 2142 2.882116411 3342 G R 195732187 + 32 8,3 4 2144 2.882117647 3342 I R 195732187 + 32 8,3 4 2145 ... (1 Reply)
Discussion started by: W.C.C
1 Replies

6. Shell Programming and Scripting

BASH loop inside a loop question

Hi all Sorry for the basic question, but i am writing a shell script to get around a slightly flaky binary that ships with one of our servers. This particular utility randomly generates the correct information and could work first time or may work on the 12th or 100th attempt etc !.... (4 Replies)
Discussion started by: rethink
4 Replies

7. Shell Programming and Scripting

If loop in bash

Hello, I have a script that runs a series of commands. Halfway through the script, I want it to check whether everything is going alright: if it is, to proceed with the script, if it isn't to repeat the last step until it gets it right. My code so far looks like this, simplified a bit: ... (3 Replies)
Discussion started by: Leo_Boon
3 Replies

8. Shell Programming and Scripting

Speed up the loop in shell script

Hi I have written a shell script which will test 300 to 500 IPs to find which are pinging and which are not pinging. the script which give output as 10.x.x.x is pining 10.x.x.x. is not pining - - - 10.x.x.x is pining like above. But, this script is taking... (6 Replies)
Discussion started by: kumar85shiv
6 Replies

9. Shell Programming and Scripting

Help on for loop in bash

Hi, In the code "for loop" has been used to search for files (command line arguments) in directories and then produce the result to the standard output. However, I want when no files are named on the command line, it should read a list of files from standard input and it should use the command... (7 Replies)
Discussion started by: Ra26k
7 Replies

10. Shell Programming and Scripting

Speed up extraction od tar.bz2 files using bash

The below bash will untar each tar.bz2 folder in the directory, then remove the tar.bz2. Each of the tar.bz2 folders ranges from 40-75GB and currently takes ~2 hours to extract. Is there a way to speed up the extraction process? I am using a xeon processor with 12 cores. Thank you :). ... (7 Replies)
Discussion started by: cmccabe
7 Replies
PMDABASH(1)						      General Commands Manual						       PMDABASH(1)

NAME
pmdabash - Bourne-Again SHell trace performance metrics domain agent SYNOPSIS
$PCP_PMDAS_DIR/bash/pmdabash [-C] [-d domain] [-l logfile] [-I interval] [-t timeout] [-U username] configfile DESCRIPTION
pmdabash is an experimental Performance Metrics Domain Agent (PMDA) which exports "xtrace" events from a traced bash(1) process. This includes the command execution information that would usually be sent to standard error with the set -x option to the shell. Event metrics are exported showing each command executed, the function name and line number in the script, and a timestamp. Additionally, the process identifier for the shell and its parent process are exported. This requires bash version 4 or later. A brief description of the pmdabash command line options follows: -d It is absolutely crucial that the performance metrics domain number specified here is unique and consistent. That is, domain should be different for every PMDA on the one host, and the same domain number should be used for the same PMDA on all hosts. -l Location of the log file. By default, a log file named bash.log is written in the current directory of pmcd(1) when pmdabash is started, i.e. $PCP_LOG_DIR/pmcd. If the log file cannot be created or is not writable, output is written to the standard error instead. -s Amount of time (in seconds) between subsequent evaluations of the shell trace file descriptor(s). The default is 2 seconds. -m Maximum amount of memory to be allowed for each event queue (one per traced process). The default is 2 megabytes. -U User account under which to run the agent. The default is the unprivileged "pcp" account in current versions of PCP, but in older versions the superuser account ("root") was used by default. INSTALLATION
In order for a host to export the names, help text and values for the bash performance metrics, do the following as root: # cd $PCP_PMDAS_DIR/bash # ./Install As soon as an instrumented shell script (see INSTRUMENTATION selection below) is run, with tracing enabled, new metric values will appear - no further setup of the agent is required. If you want to undo the installation, do the following as root: # cd $PCP_PMDAS_DIR/bash # ./Remove pmdabash is launched by pmcd(1) and should never be executed directly. The Install and Remove scripts notify pmcd(1) when the agent is installed or removed. INSTRUMENTATION
In order to allow the flow of event data between a bash(1) script and pmdabash, the script should take the following actions: #!/bin/sh source $PCP_DIR/etc/pcp.sh pcp_trace on $@ # enable tracing echo "awoke, $count" pcp_trace off # disable tracing The tracing can be enabled and disabled any number of times by the script. On successful installation of the agent, several metrics will be available: $ pminfo bash bash.xtrace.numclients bash.xtrace.maxmem bash.xtrace.queuemem bash.xtrace.count bash.xtrace.records bash.xtrace.parameters.pid bash.xtrace.parameters.parent bash.xtrace.parameters.lineno bash.xtrace.parameters.function bash.xtrace.parameters.command When an instrumented script is running, the generation of event records can be verified using the pmevent(1) command, as follows: $ pmevent -t 1 -x '' bash.xtrace.records host: localhost samples: all bash.xtrace.records["4538 ./test-trace.sh 1 2 3"]: 5 event records 10:00:05.000 --- event record [0] flags 0x19 (point,id,parent) --- bash.xtrace.parameters.pid 4538 bash.xtrace.parameters.parent 4432 bash.xtrace.parameters.lineno 43 bash.xtrace.parameters.command "true" 10:00:05.000 --- event record [1] flags 0x19 (point,id,parent) --- bash.xtrace.parameters.pid 4538 bash.xtrace.parameters.parent 4432 bash.xtrace.parameters.lineno 45 bash.xtrace.parameters.command "(( count++ ))" 10:00:05.000 --- event record [2] flags 0x19 (point,id,parent) --- bash.xtrace.parameters.pid 4538 bash.xtrace.parameters.parent 4432 bash.xtrace.parameters.lineno 46 bash.xtrace.parameters.command "echo 'awoke, 3'" 10:00:05.000 --- event record [3] flags 0x19 (point,id,parent) --- bash.xtrace.parameters.pid 4538 bash.xtrace.parameters.parent 4432 bash.xtrace.parameters.lineno 47 bash.xtrace.parameters.command "tired 2" 10:00:05.000 --- event record [4] flags 0x19 (point,id,parent) --- bash.xtrace.parameters.pid 4538 bash.xtrace.parameters.parent 4432 bash.xtrace.parameters.lineno 38 bash.xtrace.parameters.function "tired" bash.xtrace.parameters.command "sleep 2" FILES
$PCP_PMCDCONF_PATH command line options used to launch pmdabash $PCP_PMDAS_DIR/bash/help default help text file for the bash metrics $PCP_PMDAS_DIR/bash/Install installation script for the pmdabash agent $PCP_PMDAS_DIR/bash/Remove undo installation script for pmdabash $PCP_LOG_DIR/pmcd/bash.log default log file for error messages and other information from pmdabash PCP ENVIRONMENT
Environment variables with the prefix PCP_ are used to parameterize the file and directory names used by PCP. On each installation, the file /etc/pcp.conf contains the local values for these variables. The $PCP_CONF variable may be used to specify an alternative configura- tion file, as described in pcp.conf(5). SEE ALSO
bash(1), pmevent(1) and pmcd(1). Performance Co-Pilot PCP PMDABASH(1)
All times are GMT -4. The time now is 05:55 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy