How many studies have unequal values for each pair?
I have several Studies (s) which has points (p) having Values (v).
My goal is to determine for each pair of points, how many studies have different values ( if available ).
For example, the pair (p1,p5) are involved in 2 studies , STUDY 1 (value1,value3 ) and STUDY 3 (value1, value5 ) where both values are different. So the count for this pair is 2. Pair (p1,p3) is present in both studies 1 and 3 with same values. So the count is 0.
So my desired output is
I do have a working solution for this which works for a small data-set for runs forever for the actual dataset which has several thousand factors in each column
You are right, that will pair will be 2. Thanks a lot..I need to understand your code now
Hello senhia83,
First of all thank you for asking good question and showing us what you have done to solve that too, keep it up.
Coming to your question, could you please try following.
One Liner form of solution: Non-one liner form of solution: NOTE: Considering that field 2 always will have the number(digit) in it, as per your Input_file shown. Also tested this with GNU awk.
Thanks,
R. Singh
Last edited by RavinderSingh13; 07-29-2016 at 11:19 AM..
Reason: Changed the non-one liner form solution's spaces and fit them to good Looking one :)
Assuming that your input file is always sorted with all lines for a given study adjacent to each other (as in your sample dataset file), you might want to try the following awk script to handle your problem:
which, with you sample data, produces the output:
or, with unsorted output instead of sorted output:
If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
Note that if I understand the requirements correctly, I don't believe the code suggested by rdrtx1 or RavinderSingh13 produce reliable results (although they do work for the sample data provided in post #1 in this thread). For details, see the next post.
Last edited by Don Cragun; 07-31-2016 at 04:51 AM..
Reason: Add note.
Looking closer at the script rdrtx1 and RavinderSingh13 suggested, I find that I do not understand how they restrict comparisons of values for various points to just compare values within a single study. Although all three of our suggestions produce similar output for the sample data given in post #1 in this thread (differing only in the order of lines in the output), if we change the sample input data to:
where all values in each study are identical, I believe the output should just be the header line. With this sample input, my suggestion produces the output:
the code rdrtx1 suggested produces the output:
and the code suggested by RavinderSingh13 produces the output:
Did I misunderstand the requirements?
Team,
I have a file like below
FILE:
NAM1,KEY1,VAL1
NAM1,KEY2,VAL2
NAM1,KEY3,VAL3
NAM2,KEY1,VALA
NAM2,KEY2,VALB
NAM2,KEY3,VALCOutput:
I have to build commands like below
<Script> VAL1 VAL2 VAL3 NAME1
<Script> VALA VALB VALC NAME2Can you please help with awk command i can use... (4 Replies)
I have two files for comparison which are extracts from set of xml files.
file1 has:
Comparing File: BRCSH1to320140224CC3.xml
:: TZZZ:BR
:: TAZZ:OUT
UIZZ:0 :: ERAZ:1.000000
UIZZ:0 :: CTZZ:B
UIZZ:0 :: CCAZ:MYR
Comparing File: BRMY20140224CC18REG013SPFNSY13.xml
:: TZZZ:BR
:: TAZZ:INB... (1 Reply)
Hi,
I have 25 groups and I need to perform all possible pairwise compariosns between them using the formula n(n-1)/2. SO in my case it will be 25(25-1)/2 which is equal to 300 comparisons.
my 25 groups are
FG1 FG2 FG3 FG4 FG5
NT5E CD44 CD44 CD44 AXL
ADAM19 CCDC80 L1CAM L1CAM CD44... (1 Reply)
Hi,
I have two files that I would like to merge and think that there should be a solution using awk. The files look something like this:
file 1
IDX1 IDY1
IDX2 IDY2
IDX3 IDY3
file 2
IDY1 dataA data1
IDY2 dataB data2
IDY3 dataC data3
Desired output
IDX1 IDY1 dataA data1
IDX2 ... (5 Replies)
How do I split a file into many parts but with different amounts of lines per part? I looked at the split command but that only splits evenly.
I'd like a range specified to determine how many lines each output file should have.
For example, if the input file has 1000 lines and the range is... (1 Reply)