How many studies have unequal values for each pair? Post: 302978473

Sponsored Content

Top Forums Shell Programming and Scripting How many studies have unequal values for each pair? Post 302978473 by Don Cragun on Sunday 31st of July 2016 01:40:10 AM

07-31-2016

Registered User

Assuming that your input file is always sorted with all lines for a given study adjacent to each other (as in your sample dataset file), you might want to try the following awk script to handle your problem:

Code:

#!/bin/ksh
awk '
BEGIN {	# Before reading 1st line from the input file, set output field
	# separator to <tab> and print heading.
	OFS = "\t"
	print "Point1", "Point2", "Number_of_Studies_with_Different_Values"
}
NR == 1 {
	# Skip over the input file header line.
	next
}
$1 != last {
	# If the 1st field has changed, process all of the lines read for the
	# previous value of the 1st field.
	count()

	# Save the current value of the 1st field.
	last = $1
}
{	# Increment the number of lines seen for this 1st field value and save
	# the point name and the value from the current line.
	p[++n] = $2
	v[n] = $3
}
END {	# Process the last 1st field value.
	count()

	# Print the accumulated results.
	for(i in diffs)
		# Uncomment one and only one of the following lines.
		print i, diffs[i] | "sort"	# use this to sort output
		# print i, diffs[i]		# use this for unsorted output
}
function count(		i, j) {
	# Process the set of lines for a given 1st field value.
	# There are "n" lines in the set.  for each of the 1st "n-1" lines
	# in this set...
	for(i = 1; i < n; i++)
		# for each of the remaining lines in the set...
		for(j = i + 1; j <= n; j++)
			# if the values for those two points are different...
			if(v[i] != v[j])
				# Increment the number of times this pair of
				# points (with the two points sorted by point
				# names) has had different values.
				diffs[(p[i] < p[j]) ?  p[i] OFS p[j] : \
				    p[j] OFS p[i]]++

	# Reset the line counter for the next set.
	n = 0
}' dataset

which, with you sample data, produces the output:

Code:

Point1	Point2	Number_of_Studies_with_Different_Values
p1	p2	1
p1	p4	1
p1	p5	2
p2	p3	1
p2	p4	2
p2	p5	1
p3	p4	1
p3	p5	2

or, with unsorted output instead of sorted output:

Code:

Point1	Point2	Number_of_Studies_with_Different_Values
p3	p4	1
p3	p5	2
p2	p3	1
p2	p4	2
p2	p5	1
p1	p2	1
p1	p4	1
p1	p5	2

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.

Note that if I understand the requirements correctly, I don't believe the code suggested by rdrtx1 or RavinderSingh13 produce reliable results (although they do work for the sample data provided in post #1 in this thread). For details, see the next post.

Last edited by Don Cragun; 07-31-2016 at 04:51 AM.. Reason: Add note.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

7 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Splitting a file into unequal parts

How do I split a file into many parts but with different amounts of lines per part? I looked at the split command but that only splits evenly. I'd like a range specified to determine how many lines each output file should have. For example, if the input file has 1000 lines and the range is...

2. UNIX for Dummies Questions & Answers

Merge two files with common IDs but unequal number of rows

Hi, I have two files that I would like to merge and think that there should be a solution using awk. The files look something like this: file 1 IDX1 IDY1 IDX2 IDY2 IDX3 IDY3 file 2 IDY1 dataA data1 IDY2 dataB data2 IDY3 dataC data3 Desired output IDX1 IDY1 dataA data1 IDX2 ...

3. Shell Programming and Scripting

Newline between unequal record fields

Assume the following 5 records (field separator is a space): 0903 0903 0910 0910 0910 0910 0910 0910 0917 0917 0917 0917 0924 1001 1001 1001 1001 1008 1008 1008 1008 1015 1015 1015 1015 1022 1029 1029 1029 1029 1105 1105 1105 1105 1112 1112 1112 1112 1119 1126 1126 1126 1126 1203 1203 1203 1203...

4. Shell Programming and Scripting

Pair wise comparisons

Hi, I have 25 groups and I need to perform all possible pairwise compariosns between them using the formula n(n-1)/2. SO in my case it will be 25(25-1)/2 which is equal to 300 comparisons. my 25 groups are FG1 FG2 FG3 FG4 FG5 NT5E CD44 CD44 CD44 AXL ADAM19 CCDC80 L1CAM L1CAM CD44...

5. Shell Programming and Scripting

Compare two unsorted unequal files extracted from xml

I have two files for comparison which are extracts from set of xml files. file1 has: Comparing File: BRCSH1to320140224CC3.xml :: TZZZ:BR :: TAZZ:OUT UIZZ:0 :: ERAZ:1.000000 UIZZ:0 :: CTZZ:B UIZZ:0 :: CCAZ:MYR Comparing File: BRMY20140224CC18REG013SPFNSY13.xml :: TZZZ:BR :: TAZZ:INB...

6. Shell Programming and Scripting

Finding difference between two columns of unequal length

Hi, I have two files which look like this cat waitstate.txt 18.2 82.1 cat gostate.txt 5.6 5.8 6.1 6.3 6.6 6.9 7.2 7.5

7. Shell Programming and Scripting

awk name pair values

Team, I have a file like below FILE: NAM1,KEY1,VAL1 NAM1,KEY2,VAL2 NAM1,KEY3,VAL3 NAM2,KEY1,VALA NAM2,KEY2,VALB NAM2,KEY3,VALCOutput: I have to build commands like below <Script> VAL1 VAL2 VAL3 NAME1 <Script> VALA VALB VALC NAME2Can you please help with awk command i can use...

7 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Splitting a file into unequal parts

Discussion started by: revax

2. UNIX for Dummies Questions & Answers

Merge two files with common IDs but unequal number of rows

Discussion started by: katie8856

3. Shell Programming and Scripting

Newline between unequal record fields

Discussion started by: tree

4. Shell Programming and Scripting

Pair wise comparisons

Discussion started by: Diya123

5. Shell Programming and Scripting

Compare two unsorted unequal files extracted from xml

Discussion started by: vamsi gunda

6. Shell Programming and Scripting

Finding difference between two columns of unequal length

Discussion started by: jamie_123

7. Shell Programming and Scripting

awk name pair values

Discussion started by: mallak