Find smallest between replicates ID


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Find smallest between replicates ID
# 1  
Old 07-23-2014
Find smallest between replicates ID

Hi All
I need to find the smallest values between replicates id (column1)
Input file:
Code:
a name1 1200
a name2 800
b name1 100
b name2 150
b name3 4

output:

Code:
a name2 800
b name3 4

Do you have any suggestion?

Thank you!
# 2  
Old 07-23-2014
Given what you have learned from your earlier thread Output minimum and maximum values for replicates ID, what have you tried to solve this problem on your own?
# 3  
Old 07-23-2014
Hi Don Cragun and thank you for your reply!
unfortunately the command of the previous post does not work (I resolve the issue from my own with completely different approach).

the command was

Code:
awk '{idx=$1 FS $2}FNR==1{a3[idx]=$3}{a3[idx]=(a3[idx]>$3)?a3[idx]:$3;a4[idx]=($4>a4[idx])?$4:a4[idx]} END{for(i in a3)print i,a3[i],a4[i]}' myFile

with my File:

Code:
a x 1 4
a x 2 5
b x 5 10
b x 6 12
c x 8 15
c x 6 12

the output is:

Code:
a x 2 5
b x 6 12
c x 8 15

As you can see in the column 3 is not reported the smalles value.
I try to change a little bit the command without success.

Giuliano
# 4  
Old 07-23-2014
Yes. In your previous thread you wanted to print the maximum value for the 4th column and the minimum value for the 3rd column. Now you have an easier job; you just want to print the line that has the minimum value for the 3rd column (and there is no 4th column).

How did you try to change that code to get what you need for this problem?

What did it do?
# 5  
Old 07-23-2014
I tried this one(suppose file with 2 column, first column ID)

Code:
awk '{idx=$1}FNR==1{a3[idx]=$2}{a3[idx]=(a3[idx]>$2)?$2:a3[idx]} END{for(i in a3)print i,a3[i]}' myFile

But I have some problem because the command just output the first lane!
# 6  
Old 07-23-2014
This might help you

Code:
awk '{ 
	# duplicate is column1
	col = $1
	
	# value to be compared is from column3
	value = $3

	# Here we track for duplicate records
	rep[col]++

      }
      {
	# if column is not in array meaning array does not have index col so far
        # or column in array meaning index col is exists in array a but
	# array element is greater than current line value ($3) then 
	# modify array a 
	if(!(col in a) || ( col in a && a[col] > value))
	{
		a[col] = value
	
		# Here we set o/p required you can also write $1 OFS $2 etc
		# Used in end block
		output[value] = $0 
	}

      }
   END{
	# Loop throuh rep array
	for(i in rep)
	{
		# if array elements is greater then 1 then its duplicate 
		# so print contents from array output 
		# where index being element of array a 
		# array a index is current index i
		if(rep[i]>1 )
			print output[a[i]]
	}
      }'    file

This User Gave Thanks to Akshay Hegde For This Post:
# 7  
Old 07-23-2014
Quote:
Originally Posted by giuliangiuseppe
I tried this one(suppose file with 2 column, first column ID)

Code:
awk '{idx=$1}FNR==1{a3[idx]=$2}{a3[idx]=(a3[idx]>$2)?$2:a3[idx]} END{for(i in a3)print i,a3[i]}' myFile

But I have some problem because the command just output the first lane!
The code marked in red above (which is the only portion of your code that adds elements to the array a3) is only executed when FNR==1 (i.e., only when you are looking at the 1st line of the current input file). So, when you print the array at the end, only that one element is found.

The following uses similar logic to the code provided by Akshay Hegde, but will also print a line for keys that only appear once in your input file:
Code:
awk '
!($1 in d) || f3[$1] > $3 {
	d[$1] = $0
	f3[$1] = $3
}
END {	for(i in d)
		print d[i]
}' myFile

which produces:
Code:
a name2 800
b name3 4

As always, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk.
These 2 Users Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with keep the smallest record in file

Input file US Score 10 UK Ball 20 AS Score 50 AK Ball 10 PZ Ballon 50 PA Score 70 WT Data 10 . . Desired output file US Score 10 AK Ball 10 WT Data 10 . . (2 Replies)
Discussion started by: perl_beginner
2 Replies

2. Shell Programming and Scripting

How to write program that find winner who choose the smallest number. UNIX process?

In the game of “Unique”, multiple players privately choose an integer. They then reveal their choice. The winner is the player who chose the smallest unique number. The game is considered a draw if no unique integer was chosen. You would write a program that simulate such a game according to the... (1 Reply)
Discussion started by: dantesma
1 Replies

3. Shell Programming and Scripting

Merge row based on replicates ID

Dear All, I was wondering if you may help me with an issue. I would like to merge row based on column 1. input file: b1 ggg b2 fff NA NA hhh NA NA NA NA NA a1 xxx a2 yyy NA NA zzz NA NA NA NA NA a1 xxx NA NA a3 ttt NA ggg NA NA NA NA output file: b1 ggg b2 fff NA NA hhh NA NA NA NA NA... (5 Replies)
Discussion started by: giuliangiuseppe
5 Replies

4. Shell Programming and Scripting

Find larger on replicates and output

Hi All I have a question. I have a file like this: 10 name1 ID1 value1 value2 valueN.. 31 name2 ID1 value1 value2 valueN.. 20 name3 ID2 value1 value2 valueN.. 23 name4 ID2 value1 value2 valueN.. 33 name5 ID2 value1 value2 valueN.. 45 name6 ID2 value1 value2 valueN.. well, my output... (2 Replies)
Discussion started by: giuliangiuseppe
2 Replies

5. Shell Programming and Scripting

Find biggest values on replicates

Dear All I was wondering if someone of you know how to resolve an issue that I met. In particular I have a file like this: ENSMUSG01 chr1 77837902 77853530 ENSMUSG02 chr2 18780447 18811972 ENSMUSG02 chr2 18780453 18811626 ENSMUSG02 chr2 18807356 18811987 ENSMUSG03 chr3 142575634 142576538... (6 Replies)
Discussion started by: giuliangiuseppe
6 Replies

6. Shell Programming and Scripting

Output minimum and maximum values for replicates ID

Hi All I hope that someone could help me! I have an input file like this, with 4 colum(ID, feature1, start, end): a x 1 5 b x 3 10 b x 4 9 b x 5 16 c x 5 9 c x 4 8 And my output file should be like this: a x 1 5 b x 3 16 c x 4 9 What I would like to do is to output for each ID... (2 Replies)
Discussion started by: giuliangiuseppe
2 Replies

7. Shell Programming and Scripting

Find smallest & largest in every column

Dear All, I have input like this, J_15TEST_ASH05_33A22.13885.txt: $$ 1 MAKE SP1501 1 1 4 6101 7392 2 2442 2685 18 3201 4008 20 120 4158 J_15TEST_ASH05_33A22.13885.txt: $$ 1 MAKE SP1502 1 1 4 5125 6416 2 ... (4 Replies)
Discussion started by: attila
4 Replies

8. Shell Programming and Scripting

Find the smallest block

Hi, Here's my data - aa bb cc aa dd ee Now I need to find the smallest block surrounded by aa & dd. Following is not helpful - sed -n '/aa/,/dd/p' file I need only - aa dd (1 Reply)
Discussion started by: nexional
1 Replies

9. Programming

Help with find highest and smallest number in a file with c

Input file: #data_1 AGDG #data_2 ADG #data_3 ASDDG DG #data_4 A Desired result: Highest 7 Slowest 1 code that I try but failed to archive my goal :( #include <stdio.h> (2 Replies)
Discussion started by: cpp_beginner
2 Replies

10. Shell Programming and Scripting

AWK (how) to get smallest/largest nr of ls -la

Hey, This is a long-shot however, I am stuck with the following problem: I have the output from ls -la, and I want to sort some of that data out by using AWK to filter it. ls -la | awk -f scriptname.awk Input: For example: drwxr-xr-x 3 user users 4096 2010-03-14 20:15 bin/... (5 Replies)
Discussion started by: abciscool
5 Replies
Login or Register to Ask a Question