Choosing between repeated entries based on the "absolute values" of a column


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Choosing between repeated entries based on the "absolute values" of a column
# 1  
Old 10-07-2013
Choosing between repeated entries based on the "absolute values" of a column

Hello, I was looking for a way to select between the repeated entries (column1) based on the values of absolute values of column 3 (larger value). For example if the same gene id has FC value -2 and 1, I should get the output as -2. Kindly help.

Code:
GeneID          Description    FC 
LOC_Os12g44390     Os.8335.1   -0.00377            
LOC_Os12g44390     Os.8335.1   -0.00877       
LOC_Os12g44390     Os.8335.2   -0.02457     
LOC_Os12g44330     Os.17376.2   -0.00111       
LOC_Os12g44330     Os.17376.2   -0.00716       
LOC_Os12g44320     Os.55718.1   -0.13515       
LOC_Os12g44320     Os.55718.1   -0.07858       
LOC_Os12g44320     OsAffx.20117.1   0.005586

Desired output:
Code:
LOC_Os12g44390     Os.8335.2   -0.02457
LOC_Os12g44330     Os.17376.2   -0.00716    
LOC_Os12g44320     Os.55718.1   -0.13515

# 2  
Old 10-07-2013
Put this into "script.awk":
Code:
function abs(x){
  return ((x < 0.0) ? -x : x)
}
NR>1{
  if (max[$1]<abs($3)){
    max[$1]=abs($3)
    a[$1]=$2
    b[$1]=$3
  }
}
END{
  for (i in a){
    print i"\t"a[i]"\t"b[i]
  }
}

Then run:
Code:
awk -f script.awk file

This User Gave Thanks to bartus11 For This Post:
# 3  
Old 10-07-2013
Another way with sort:
Code:
$ sed -n '2,$p' file | sort -n -k1,1 -k3,3 | sort -ur -k1,1

sed to suppress first line (column name).
Regards.

Last edited by disedorgue; 10-07-2013 at 06:23 PM.. Reason: Replace -r by -n of first sort for numerical sort
This User Gave Thanks to disedorgue For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Choosing rows based on column values

I have a .csv file: A,B,0.6 C,D,-0.7 D,E,0.1 A,E,0.45 D,G, -0.4 I want to select rows based on the values of the 3rd columns such that it is >=0.5 or <= -0.5 Thanks. A,B,0.6 D,G, -0.7 (1 Reply)
Discussion started by: Sanchari
1 Replies

2. Shell Programming and Scripting

Choosing between repeated entries based on a column field

Hello, I have an input file: LOC_Os04g01890\LOC_Os05g17604 0.051307 LOC_Os04g01890\LOC_Os05g17604 0.150977 LOC_Os04g01890\LOC_Os05g17604 0.306231 LOC_Os04g01890\LOC_Os06g33100 0.168037 LOC_Os04g01890\LOC_Os06g33100 0.236293 ... (3 Replies)
Discussion started by: Sanchari
3 Replies

3. Shell Programming and Scripting

How to delete a column/columns of a CSV file which has cell values with a string enclosed in " , "?

How can I delete a column from a CSV file which has comma separated value with a string enclosed in double quotes and a comma in between? I have a file 44.csv with 4 lines including the header like the below format: column1, column2, column3, column 4, column5, column6 12,455,"string with... (6 Replies)
Discussion started by: dhruuv369
6 Replies

4. Shell Programming and Scripting

awk based script to print the "mode(statistics term)" for each column in a data file

Hi All, Thanks all for the continued support so far. Today, I need to find the most occurring string/number(also called mode in statistics terminology) for each column in a data file (.csv type). For one column of data(1.txt) like below Sample 1 2 2 3 4 1 1 1 2 I can find the mode... (6 Replies)
Discussion started by: ks_reddy
6 Replies

5. Shell Programming and Scripting

Substituting comma "," for dot "." in a specific column when comma"," is a delimiter

Hi, I'm dealing with an issue and losing a lot of hours figuring out how i would solve this. I have an input file which looks like this: ('BLABLA +200-GRS','Serviço ','TarifaçãoServiço','wap.bla.us.0000000121',2985,0,55,' de conversão em escada','Dia','Domingos') ('BLABLA +200-GRR','Serviço... (6 Replies)
Discussion started by: poliver
6 Replies

6. Solaris

Repeated error - "se_hdlc: clone device must be attached before use" in /var/adm/messages

Below is the error being repeated on my Solaris 9 Sun-Fire-V890 machine. SAN team confirmed as everything is fine from their end. I did google and found that some people say its a known Oracle bug when you have Oracle 10G installed on your system but I kind of disagree with them. Please see below... (2 Replies)
Discussion started by: vikkash
2 Replies

7. UNIX for Advanced & Expert Users

Sendmail: how to restrict delivery based on "to" or "from"?

Hello, I manage a large sendmail server that handles more than 20,000 pieces of mail per day. It's a bit unusual in that all this mail is only being sent to and from 4 local accounts. (It's an automated transaction processing system, whereby users submit a transaction via email attachment). ... (2 Replies)
Discussion started by: lupin..the..3rd
2 Replies

8. Shell Programming and Scripting

awk command to replace ";" with "|" and ""|" at diferent places in line of file

Hi, I have line in input file as below: 3G_CENTRAL;INDONESIA_(M)_TELKOMSEL;SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL My expected output for line in the file must be : "1-Radon1-cMOC_deg"|"LDIndex"|"3G_CENTRAL|INDONESIA_(M)_TELKOMSEL"|LAST|"SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL" Can someone... (7 Replies)
Discussion started by: shis100
7 Replies

9. Shell Programming and Scripting

Select only top "N" records based on column value

Hi Gurus, I know this'll be simple task for all the geeks out here but me being a newbie is finding it hard to crack this shell. Ok coming to the task I've a delimited file as below ================================================== ==================================================== ... (8 Replies)
Discussion started by: asandy1234
8 Replies

10. Shell Programming and Scripting

"Join" or "Merge" more than 2 files into single output based on common key (column)

Hi All, I have working (Perl) code to combine 2 input files into a single output file using the join function that works to a point, but has the following limitations: 1. I am restrained to 2 input files only. 2. Only the "matched" fields are written out to the "matched" output file and... (1 Reply)
Discussion started by: Katabatic
1 Replies
Login or Register to Ask a Question