## Bitwise comparison of cols

Bitwise comparison of cols
# 1
11-07-2013
Bitwise comparison of cols

Hello,

I want to compute the bitwise number of matches in pairwise fashion for all columns. The problem is I have 18486955 rows and 750 columns. Please help with code, I believe this will take a lot of time, is there a way of tracking progress?

Input
Output
# 2
11-07-2013
Quote:
Hello,

I want to compute the bitwise number of matches in pairwise fashion for all columns. The problem is I have 18486955 rows and 750 columns. Please help with code, I believe this will take a lot of time, is there a way of tracking progress?

Input
Output
What do the numbers in the 3rd field of your output mean? It isn't the number of different pairings found (or Org1 Org3 would be 3). It isn't the number of time both elements of the pairing are the same (or Org1 Org3 would be 0).
 Don Cragun View Public Profile for Don Cragun Find all posts by Don Cragun
# 3
11-07-2013
Try this

---------- Post updated at 10:55 AM ---------- Previous update was at 10:02 AM ----------

Quote:
I believe this will take a lot of time, is there a way of tracking progress?
Yes, my initial testing indicates it may take 3 or 4 weeks or runtime! The following logs each block to 100 lines processed to stderr:

Last edited by Chubler_XL; 11-07-2013 at 09:11 PM..
This User Gave Thanks to Chubler_XL For This Post:
 Chubler_XL View Public Profile for Chubler_XL Find all posts by Chubler_XL
# 4
11-08-2013
Quote:
Originally Posted by Don Cragun
What do the numbers in the 3rd field of your output mean? It isn't the number of different pairings found (or Org1 Org3 would be 3). It isn't the number of time both elements of the pairing are the same (or Org1 Org3 would be 0).

The third number is the number of matches (bitwise),, Org1 is AAA,Org3 is TAG,, so only the middle A matches ..hence the 1,,so comparing AAA and AAG is 2, TAG and GTA is 0..

---------- Post updated 11-08-13 at 10:40 AM ---------- Previous update was 11-07-13 at 10:54 PM ----------

Quote:
Yes, my initial testing indicates it may take 3 or 4 weeks or runtime! The following logs each block to 100 lines processed to stderr:
Ok, I guess I have to wait,,,I just started the process with & at the end, ,,, I believe even if I close the remote terminal, it will continue running in the background?
# 5
11-08-2013
Logging out will send a sighup to your background processes causing them to stop. disown will make the jobs not receive that signal, see man page.
nohup when sending the job to background will do similar.
These 2 Users Gave Thanks to RudiC For This Post:
 RudiC View Public Profile for RudiC Find all posts by RudiC
# 6
11-08-2013
Hi.

Most modern computers have multiple cores and/or multiple CPUs. This task is very CPU-intensive: about 280,000 comparisons per line (if my binomial calculation is correct).

So it makes sense to try and utilize all the power that the computer has. Here is an example that uses the awk code of Chubler_XL (which I will not list -- it is in a separate file "a1").

The idea is that the input file is split up and many instances are run simultaneously (hence "parallel"). This script will run 1,2, and 4 instances. The computer is a beefy server that uses a 3-GHz XEON CPU, 4-cores, each with hyper-threading:
producing:
The user time will always be about the same because we need to do n operations, regardless of how many processes are running. The real time, however, decreases almost linearly with the addition of "jobs" (processes, and, in this case, effectively cores). So one might expect a 20-fold decrease if one had 20 CPUs available. In reality, there is a slight amount of overhead from parallel, but I noticed a decrease in real time even with more than 1 job and a single CPU (on a different computer). Although this is CPU-intensive, there may be disk contention if there is a large number of processes. There is no way to predict what the value large would be, so testing will need to be done if many cores are available.

The output files are collected, and will need to be reduced to gather similar counts of the pairs. Outside of debugging, this seems like the only downside to me.

I recognize that this may be too advanced for the OP, but if he has time to spend over weeks waiting for the output, then perhaps he could enlist the help of a colleague.

For purposes of comparisons of methods that others may propose, I have uploaded a copy of the raw 1000-line 750 field/line text data file. The comments at the beginning of the file describe the file. As noted, I used only the first 100 lines.

Best wishes ... cheers, drl
This User Gave Thanks to drl For This Post:
 drl View Public Profile for drl Find all posts by drl
# 7
11-09-2013
Sheer curiosity: having the main loop run to NF-1 only might yield a fraction of a percent run time reduction: Unfortunately, replacing the by
increases run time dramatically, probably because the ++ operation is a register operation while the other is a full addition.
 RudiC View Public Profile for RudiC Find all posts by RudiC

## Bitwise operation for state machine

Hello All, I am writing basic state machine which maintains 8 different states and there is posibility that system may be in multiple states at a time (Except for state1 to state3. menas only once state can be active at a time from state1 to state3). I have declared...

## how to use bitwise or operator in /bin/sh

please any one can suggest me how to use bitesie || opearator to do this #initallize a=0 b=0 #condition if then a=0 else a=1 fi #bitwise or opeartion b = a || b Please view this code tag video for how to use code tags when posting code and data.

## Analysis in bitwise XOR

The purpose of this article is revealing the unrevealed parts of the bitwise XOR. As we aware, the truth table for the XOR operator is : A B A^B 0 0 0 0 1 1 1 0 1 1 1 0 For example , 1^2 will be calculated as given below: First the operands...

## bitwise and between two 32 bit binaries

Hello All, i have two 16 bit binaries that in two different variables, i want to perform a bitwise AND between the two and store the result in a different variable. can anyone throw some light on doing this in a bourne shell... eg var1= 1110101010101011 ...

## Grouping matches by cols

Dear all I have a large file w. ~ 10 million lines. The first two cols have matching partners. For example: A A A B B B or A A B A B B The matches may be separated by an unknown number of lines. My intention is to group them and add a "group" value in the last col. For...

## bitwise and if

Hi Suppose we have these code lines: #define _IN_USE 0x001 /* set when process slot is in use */ #define _EXITING 0x002 /* set when exit is expected */ #define _REFRESHING 0x004 ... 1 main () { 2 3 unsigned r_flags =_REFRESHING; 4 5 if (r_flag &...

## Bitwise negation

I am taking an online course on Unix scripting. The topic is Unix arithmetic operators and the lesson is Logical and bitwise operations. It is not clear how much storage space Unix uses to represent integers that are typed. Bitwise negation caused me to question how many bits are used to...

## resetting counter using bitwise XOR

Hi ! How to reset a variable to 0 after a reset value, say 10 using bitwise XOR. For example, int cnt=0; if(cnt<10) cnt++; else cnt = 0; How can we achieve this by using XOR only. thanks,

## bitwise operators

can anybody write a program to divide a number by another number using bitwise operators

## Bit-fields and Bitwise operators

Hi, Is it possible to use bitwise operators in bit fields? For example: typedef struct Mystruct { unsigned char A :1 ; unsigned char B :1 ; } Mystruct; and assume struct Mystruct STR_1S, STR_2S, tempSTRS = {0}; then the following line: tempSTRS = STR_1S & STR_2S; gives the...