Find biggest values on replicates


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Find biggest values on replicates
# 1  
Old 07-02-2014
Find biggest values on replicates

Dear All
I was wondering if someone of you know how to resolve an issue that I met.
In particular I have a file like this:
Code:
ENSMUSG01 chr1 77837902 77853530
ENSMUSG02 chr2 18780447 18811972
ENSMUSG02 chr2 18780453 18811626
ENSMUSG02 chr2 18807356 18811987
ENSMUSG03 chr3 142575634 142576538
ENSMUSG03 chr3 142576507 142578095
ENSMUSG03 chr3 142576296 142576910
ENSMUSG03 chr3 142575558 142578120
ENSMUSG03 chr3 142575529 142578143

What I would like to obtain is a file like this, in which for each replicates ID (column 1), I would like to report only those with the biggest lenght(and the relative coordinates)
Code:
ENSMUSG00000000003 chrX 77837902 77853530
ENSMUSG00000000028 chr16 18780447 18811972
ENSMUSG00000000031 chr7 142575529 142578143

I hope that my explanation was clear.

Thank you for your help!

Giuliano
# 2  
Old 07-02-2014
How are you determining which is 'biggest'? It's not clear from your example.
# 3  
Old 07-02-2014
Hi, it is not clear to me what you are looking for and also, how do you get from the input file that you specified to for example ENSMUSG00000000003 and chrX ?
# 4  
Old 07-02-2014
Hi
I am so sorry!!! I was in a hurry before and I did not check my message.

So my input file is like that:

Code:
ENSMUSG01 chr1 77837902 77853530 ENSMUSG02 chr2 18780447 18811972 ENSMUSG02 chr2 18780453 18811626 ENSMUSG02 chr2 18807356 18811987 ENSMUSG03 chr3 142575634 142576538 ENSMUSG03 chr3 142576507 142578095 ENSMUSG03 chr3 142576296 142576910 ENSMUSG03 chr3 142575558 142578120 ENSMUSG03 chr3 142575529 142578143

And my desired output should be like that:

Code:
ENSMUSG01 chr1 77837902 77853530 ENSMUSG02 chr2 18780447 18811972
ENSMUSG03 chr3 142575529 142578143

What I am looking for is for each ID (first column) calculate the difference between the column 3 and 4 and keep only the lane in which the difference is bigger.
Thank you again and if you have further question do not hesitate to post a reply!

Giuliano

---------- Post updated at 03:06 PM ---------- Previous update was at 03:03 PM ----------

SmilieSmilie
Hi
I am so sorry!!! I was in a hurry before and I did not check my message.

So my input file is like that:



Code:
ENSMUSG01 chr1 77837902 77853530 
ENSMUSG02 chr2 18780447 18811972 
ENSMUSG02 chr2 18780453 18811626 
ENSMUSG02 chr2 18807356 18811987 
ENSMUSG03 chr3 142575634 142576538 
ENSMUSG03 chr3 142576507 142578095 
ENSMUSG03 chr3 142576296 142576910 
ENSMUSG03 chr3 142575558 142578120 
ENSMUSG03 chr3 142575529 142578143

And my desired output should be like that:


Code:
ENSMUSG01 chr1 77837902 77853530 
ENSMUSG02 chr2 18780447 18811972 
ENSMUSG03 chr3 142575529 142578143

What I am looking for is for each ID (first column) calculate the difference between the column 3 and 4 and keep only the lane in which the difference is bigger.
Thank you again and if you have further question do not hesitate to post a reply!

Giuliano Image

Last edited by Scrutinizer; 07-02-2014 at 01:23 PM.. Reason: CODE tags
# 5  
Old 07-02-2014
Code:
awk '{diff=$4-$3; diff=diff >= 0 ?  diff:-diff; if (diff > diffs[$1]) {diffs[$1]=diff;lines[$1]=$0}} END {for (i in lines) {print lines[i]}}' file

If field 4 is always going to be greater than field 3 then you can shorten it a bit by not bothering to calculate the absolute value. Also, it's not guaranteed to preserve the ordering of the records in the file.
This User Gave Thanks to CarloM For This Post:
# 6  
Old 07-02-2014
Code:
akshay@Aix:/tmp$ awk '{d=$4-$3; if(A[$1]<d){ A[$1]=d; B[$1]=$0}}END{for(i in B)print B[i]}' file
ENSMUSG01 chr1 77837902 77853530
ENSMUSG02 chr2 18780447 18811972
ENSMUSG03 chr3 142575529 142578143

This User Gave Thanks to Akshay Hegde For This Post:
# 7  
Old 07-02-2014
To preserve order, presuming an input file grouped by the first field:
Code:
awk '{d=$4-$3} $1!=p{if(p)print s; p=$1; m=0} d>m{s=$0; m=d} END{print s}' file


Last edited by Scrutinizer; 07-02-2014 at 01:37 PM..
This User Gave Thanks to Scrutinizer For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Merge row based on replicates ID

Dear All, I was wondering if you may help me with an issue. I would like to merge row based on column 1. input file: b1 ggg b2 fff NA NA hhh NA NA NA NA NA a1 xxx a2 yyy NA NA zzz NA NA NA NA NA a1 xxx NA NA a3 ttt NA ggg NA NA NA NA output file: b1 ggg b2 fff NA NA hhh NA NA NA NA NA... (5 Replies)
Discussion started by: giuliangiuseppe
5 Replies

2. Shell Programming and Scripting

Find larger on replicates and output

Hi All I have a question. I have a file like this: 10 name1 ID1 value1 value2 valueN.. 31 name2 ID1 value1 value2 valueN.. 20 name3 ID2 value1 value2 valueN.. 23 name4 ID2 value1 value2 valueN.. 33 name5 ID2 value1 value2 valueN.. 45 name6 ID2 value1 value2 valueN.. well, my output... (2 Replies)
Discussion started by: giuliangiuseppe
2 Replies

3. Shell Programming and Scripting

Find smallest between replicates ID

Hi All I need to find the smallest values between replicates id (column1) Input file: a name1 1200 a name2 800 b name1 100 b name2 150 b name3 4output: a name2 800 b name3 4 Do you have any suggestion? Thank you! (9 Replies)
Discussion started by: giuliangiuseppe
9 Replies

4. Shell Programming and Scripting

Output minimum and maximum values for replicates ID

Hi All I hope that someone could help me! I have an input file like this, with 4 colum(ID, feature1, start, end): a x 1 5 b x 3 10 b x 4 9 b x 5 16 c x 5 9 c x 4 8 And my output file should be like this: a x 1 5 b x 3 16 c x 4 9 What I would like to do is to output for each ID... (2 Replies)
Discussion started by: giuliangiuseppe
2 Replies

5. Shell Programming and Scripting

How to find biggest word in a file....?

With any cmd like sed grep ask etc... (1 Reply)
Discussion started by: sidpatil
1 Replies

6. Shell Programming and Scripting

find biggest number inside file

Hi, I wanna find the biggest number inside of a file this is kind of example of file: 9 11 55 then i just wanna print out the biggest number i had try sed filenale | sort -k1,1n | paste -s -d',' - but i had no success ... (7 Replies)
Discussion started by: prpkrk
7 Replies

7. Shell Programming and Scripting

find values between values in two different fields

Hi, I need help to find values between two different fields based on $6 (NUM) AND $1 (CD), within the same ID. The result should show the values between the NUMs which will be extracted from within $3 and $2 in data.txt file below. data.txt ex 139 142 Sc_1000004 ID 4 CD ... (2 Replies)
Discussion started by: redse171
2 Replies

8. UNIX for Dummies Questions & Answers

Disk Usage in GB and Unix command to find the biggest file/folder

Hi All, Please help me out 1) Command to find the disk usage in GB. I know that du -k will give in kilobites. 2) How to find the Biggest file/folder in a given set of files/folders. Thanks in advance Regards, Manas (8 Replies)
Discussion started by: manas6
8 Replies
Login or Register to Ask a Question