I want to compute the bitwise number of matches in pairwise fashion for all columns. The problem is I have 18486955 rows and 750 columns. Please help with code, I believe this will take a lot of time, is there a way of tracking progress?
I want to compute the bitwise number of matches in pairwise fashion for all columns. The problem is I have 18486955 rows and 750 columns. Please help with code, I believe this will take a lot of time, is there a way of tracking progress?
Input
Output
What do the numbers in the 3rd field of your output mean? It isn't the number of different pairings found (or Org1 Org3 would be 3). It isn't the number of time both elements of the pairing are the same (or Org1 Org3 would be 0).
What do the numbers in the 3rd field of your output mean? It isn't the number of different pairings found (or Org1 Org3 would be 3). It isn't the number of time both elements of the pairing are the same (or Org1 Org3 would be 0).
The third number is the number of matches (bitwise),, Org1 is AAA,Org3 is TAG,, so only the middle A matches ..hence the 1,,so comparing AAA and AAG is 2, TAG and GTA is 0..
---------- Post updated 11-08-13 at 10:40 AM ---------- Previous update was 11-07-13 at 10:54 PM ----------
Quote:
Yes, my initial testing indicates it may take 3 or 4 weeks or runtime! The following logs each block to 100 lines processed to stderr:
Ok, I guess I have to wait,,,I just started the process with & at the end, ,,, I believe even if I close the remote terminal, it will continue running in the background?
Logging out will send a sighup to your background processes causing them to stop. disown will make the jobs not receive that signal, see man page.
nohup when sending the job to background will do similar.
Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris
Posts: 2,288
Thanks Given: 430
Thanked 480 Times in 395 Posts
Hi.
Most modern computers have multiple cores and/or multiple CPUs. This task is very CPU-intensive: about 280,000 comparisons per line (if my binomial calculation is correct).
So it makes sense to try and utilize all the power that the computer has. Here is an example that uses the awk code of Chubler_XL (which I will not list -- it is in a separate file "a1").
The idea is that the input file is split up and many instances are run simultaneously (hence "parallel"). This script will run 1,2, and 4 instances. The computer is a beefy server that uses a 3-GHz XEON CPU, 4-cores, each with hyper-threading:
producing:
The user time will always be about the same because we need to do n operations, regardless of how many processes are running. The real time, however, decreases almost linearly with the addition of "jobs" (processes, and, in this case, effectively cores). So one might expect a 20-fold decrease if one had 20 CPUs available. In reality, there is a slight amount of overhead from parallel, but I noticed a decrease in real time even with more than 1 job and a single CPU (on a different computer). Although this is CPU-intensive, there may be disk contention if there is a large number of processes. There is no way to predict what the value large would be, so testing will need to be done if many cores are available.
The output files are collected, and will need to be reduced to gather similar counts of the pairs. Outside of debugging, this seems like the only downside to me.
I recognize that this may be too advanced for the OP, but if he has time to spend over weeks waiting for the output, then perhaps he could enlist the help of a colleague.
For purposes of comparisons of methods that others may propose, I have uploaded a copy of the raw 1000-line 750 field/line text data file. The comments at the beginning of the file describe the file. As noted, I used only the first 100 lines.
Hello All,
I am writing basic state machine which maintains 8 different states and there is posibility that system may be in multiple states at a time (Except for state1 to state3. menas only once state can be active at a time from state1 to state3).
I have declared... (9 Replies)
please any one can suggest me how to use bitesie || opearator to do this
#initallize a=0 b=0
#condition
if then
a=0
else a=1
fi
#bitwise or opeartion b = a || b
Please view this code tag video for how to use code tags when posting code and data. (3 Replies)
The purpose of this article is revealing the unrevealed parts of the bitwise XOR.
As we aware, the truth table for the XOR operator is :
A B A^B
0 0 0
0 1 1
1 0 1
1 1 0
For example , 1^2 will be calculated as given below:
First the operands... (1 Reply)
Hello All,
i have two 16 bit binaries that in two different variables, i want to perform a bitwise AND between the two and store the result in a different variable.
can anyone throw some light on doing this in a bourne shell...
eg var1= 1110101010101011
... (8 Replies)
Dear all
I have a large file w. ~ 10 million lines.
The first two cols have matching partners.
For example:
A A
A B
B B
or
A A
B A
B B
The matches may be separated by an unknown number of lines.
My intention is to group them and add a "group" value in the last col.
For... (12 Replies)
Hi
Suppose we have these code lines:
#define _IN_USE 0x001 /* set when process slot is in use */
#define _EXITING 0x002 /* set when exit is expected */
#define _REFRESHING 0x004
...
1 main () {
2
3 unsigned r_flags =_REFRESHING;
4
5 if (r_flag &... (3 Replies)
I am taking an online course on Unix scripting. The topic is Unix arithmetic operators and the lesson is Logical and bitwise operations. It is not clear how much storage space Unix uses to represent integers that are typed. Bitwise negation caused me to question how many bits are used to... (3 Replies)
Hi !
How to reset a variable to 0 after a reset value, say 10 using bitwise
XOR.
For example,
int cnt=0;
if(cnt<10)
cnt++;
else
cnt = 0;
How can we achieve this by using XOR only.
thanks, (1 Reply)
Hi,
Is it possible to use bitwise operators in bit fields?
For example:
typedef struct Mystruct {
unsigned char A :1 ;
unsigned char B :1 ;
} Mystruct;
and assume
struct Mystruct STR_1S, STR_2S, tempSTRS = {0};
then the following line:
tempSTRS = STR_1S & STR_2S;
gives the... (3 Replies)