I am running the below bash loop on all the files of a specific type (highlighted in bold) in a directory. There are 4 awk commands that use the input files to search another and look for a match. The input files range from 27 - 259 and are a list of names. The file that is searched is 11,137,660 lines. The loop does run, however, it takes ~20 hours to complete on a computer with 64GB and a xeon 8 core processor. Is this normal and can it be made faster (more efficient)? Thank you .
the file to search in $5 (the 11,137,660 line file)
So the expected output would be:
only $4, $5 where the match was found and the average of $7 are printed
Thank you very much .
In your code, you are saving $1 in a[] and $2 in b[] and at the end you are printing them with a colon between them. In you sample data above, $4 is always the same as $1:$2. Does that same relationship occur in all lines in your file? (Saving and printing $4 in an array will be faster than saving $1 in an array, saving $2 in another array, and printing both of them.) And, you say above that you want the output to be $4, $5, and the average, but you show the output being $4, $5, a "|", $6, and the average??? Please clarify!
Your sample output above shows that the average of 1, 2, and 3 is 3. Why not 2 (i.e., (1+2+3)/3)? How many decimal places do you want printed in the average?
Are your search strings always to be exactly matched by the string starting with the 1st character of $5 and ending with the character before the <minus-sign> character in $5? (Your script will run MUCH faster if you perform one test to determine if a string is a subscript in an array instead of an average of 14-130 regular expression matches.)
You're reading a 3/4 GB file four times - I don't know if disk I/O buffering will easily cater for that. Why dont you read your four .bed files into four different (multidimensional?) arrays ( 259 is not too large an array element count), then do your four independent calculations on each large file's input line, and then output to the four different result files?
In your code, you are saving $1 in a[] and $2 in b[] and at the end you are printing them with a colon between them. In you sample data above, $4 is always the same as $1:$2 . Does that same relationship occur in all lines in your file? (Saving and printing $4 in an array will be faster than saving $1 in an array, saving $2 in another array, and printing both of them.)
Yes $4 is always the same as $1:$2
Quote:
And, you say above that you want the output to be $4, $5, and the average, but you show the output being $4 , $5 , a "|", $6 , and the average??? Please clarify!
The output should be $4 , $5 , a "|", $6 , and the average
Quote:
Your sample output above shows that the average of 1 , 2 , and 3 is 3 . Why not 2 (i.e., (1+2+3)/3) ? How many decimal places do you want printed in the average?
You aree correct in that 2 (i.e., (1+2+3)/3) is better and just one decimal place in the average.
Quote:
Are your search strings always to be exactly matched by the string starting with the 1st character of $5 and ending with the character before the <minus-sign> character in $5 ? (Your script will run MUCH faster if you perform one test to determine if a string is a subscript in an array instead of an average of 14-130 regular expression matches.)
Yes it is the first character in $5 to the "-" sign ( so in AGRN-6|gc=75) it is AGRN.
Thank you .
---------- Post updated at 10:23 AM ---------- Previous update was at 10:01 AM ----------
@RudiC
I'm not sure what you mean, sorry and thank you .
Last edited by cmccabe; 11-18-2015 at 12:22 PM..
Reason: fixed format
The below bash will untar each tar.bz2 folder in the directory, then remove the tar.bz2.
Each of the tar.bz2 folders ranges from 40-75GB and currently takes ~2 hours to extract. Is there a way to speed up the extraction process?
I am using a xeon processor with 12 cores. Thank you :).
... (7 Replies)
Hi,
In the code "for loop" has been used to search for files (command line arguments) in directories and then produce the result to the standard output. However, I want when no files are named on the command line, it should read a list of files from standard input and it should use the command... (7 Replies)
Hi
I have written a shell script which will test 300 to 500 IPs to find which are pinging and which are not pinging.
the script which give output as
10.x.x.x is pining
10.x.x.x. is not pining
-
-
-
10.x.x.x is pining
like above.
But, this script is taking... (6 Replies)
Hello,
I have a script that runs a series of commands. Halfway through the script, I want it to check whether everything is going alright: if it is, to proceed with the script, if it isn't to repeat the last step until it gets it right.
My code so far looks like this, simplified a bit:
... (3 Replies)
Hi all
Sorry for the basic question, but i am writing a shell script to get around a slightly flaky binary that ships with one of our servers. This particular utility randomly generates the correct information and could work first time or may work on the 12th or 100th attempt etc !.... (4 Replies)
I analysed disk performance with blktrace and get some data:
read:
8,3 4 2141 2.882115217 3342 Q R 195732187 + 32
8,3 4 2142 2.882116411 3342 G R 195732187 + 32
8,3 4 2144 2.882117647 3342 I R 195732187 + 32
8,3 4 2145 ... (1 Reply)
I'm trying to understand if it's possible to create a set of variables that are numbered based on another variable (using eval) in a loop, and then call on it before the loop ends.
As an example I've written a script called question (The fist command is to show what is the contents of the... (2 Replies)
hi i have a script that is taking the difference of multiple columns in a file from a value from a single row..so far i have a loop to do that.. all the data is floating point..fin has the difference between array1 and array2..array1 has 700 x 300= 210000 values and array2 has 700 values..
... (11 Replies)
Hi,
I use AIX (ksh) and Linux (bash) servers. I'm trying to do scripts to will run in both ksh and bash, and most of the time it works. But this time I don't get it in bash (I'm more familar in ksh).
The goal of my script if to read a "config file" (like "ini" file), and make various report.... (2 Replies)
Hello,
I have a Supermicro server with a P4SCI mother board running Debian Sarge 3.1. This is the "dmidecode" output related to RAM info:
RAM speed information is incomplete.. "Current Speed: Unknown", is there anyway/soft to get the speed of installed RAM modules? thanks!!
Regards :)... (0 Replies)