Assuming that your input file is always sorted with all lines for a given study adjacent to each other (as in your sample dataset file), you might want to try the following awk script to handle your problem:
Code:
#!/bin/ksh
awk '
BEGIN { # Before reading 1st line from the input file, set output field
# separator to <tab> and print heading.
OFS = "\t"
print "Point1", "Point2", "Number_of_Studies_with_Different_Values"
}
NR == 1 {
# Skip over the input file header line.
next
}
$1 != last {
# If the 1st field has changed, process all of the lines read for the
# previous value of the 1st field.
count()
# Save the current value of the 1st field.
last = $1
}
{ # Increment the number of lines seen for this 1st field value and save
# the point name and the value from the current line.
p[++n] = $2
v[n] = $3
}
END { # Process the last 1st field value.
count()
# Print the accumulated results.
for(i in diffs)
# Uncomment one and only one of the following lines.
print i, diffs[i] | "sort" # use this to sort output
# print i, diffs[i] # use this for unsorted output
}
function count( i, j) {
# Process the set of lines for a given 1st field value.
# There are "n" lines in the set. for each of the 1st "n-1" lines
# in this set...
for(i = 1; i < n; i++)
# for each of the remaining lines in the set...
for(j = i + 1; j <= n; j++)
# if the values for those two points are different...
if(v[i] != v[j])
# Increment the number of times this pair of
# points (with the two points sorted by point
# names) has had different values.
diffs[(p[i] < p[j]) ? p[i] OFS p[j] : \
p[j] OFS p[i]]++
# Reset the line counter for the next set.
n = 0
}' dataset
If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
Note that if I understand the requirements correctly, I don't believe the code suggested by rdrtx1 or RavinderSingh13 produce reliable results (although they do work for the sample data provided in post #1 in this thread). For details, see the next post.
Last edited by Don Cragun; 07-31-2016 at 04:51 AM..
Reason: Add note.
How do I split a file into many parts but with different amounts of lines per part? I looked at the split command but that only splits evenly.
I'd like a range specified to determine how many lines each output file should have.
For example, if the input file has 1000 lines and the range is... (1 Reply)
Hi,
I have two files that I would like to merge and think that there should be a solution using awk. The files look something like this:
file 1
IDX1 IDY1
IDX2 IDY2
IDX3 IDY3
file 2
IDY1 dataA data1
IDY2 dataB data2
IDY3 dataC data3
Desired output
IDX1 IDY1 dataA data1
IDX2 ... (5 Replies)
Hi,
I have 25 groups and I need to perform all possible pairwise compariosns between them using the formula n(n-1)/2. SO in my case it will be 25(25-1)/2 which is equal to 300 comparisons.
my 25 groups are
FG1 FG2 FG3 FG4 FG5
NT5E CD44 CD44 CD44 AXL
ADAM19 CCDC80 L1CAM L1CAM CD44... (1 Reply)
I have two files for comparison which are extracts from set of xml files.
file1 has:
Comparing File: BRCSH1to320140224CC3.xml
:: TZZZ:BR
:: TAZZ:OUT
UIZZ:0 :: ERAZ:1.000000
UIZZ:0 :: CTZZ:B
UIZZ:0 :: CCAZ:MYR
Comparing File: BRMY20140224CC18REG013SPFNSY13.xml
:: TZZZ:BR
:: TAZZ:INB... (1 Reply)
Team,
I have a file like below
FILE:
NAM1,KEY1,VAL1
NAM1,KEY2,VAL2
NAM1,KEY3,VAL3
NAM2,KEY1,VALA
NAM2,KEY2,VALB
NAM2,KEY3,VALCOutput:
I have to build commands like below
<Script> VAL1 VAL2 VAL3 NAME1
<Script> VALA VALB VALC NAME2Can you please help with awk command i can use... (4 Replies)