Subtracting values from 2 columns in a file

10-24-2011

Registered User

8, 0

Join Date: Oct 2011

Last Activity: 1 December 2011, 4:02 PM EST

Posts: 8

Thanks Given: 4

Thanked 0 Times in 0 Posts

Subtracting values from 2 columns in a file

Hello,
I have a file with 5 columns that looks like this:

A1BG chr19 + 58863335 58866549
A1BG chr19 - 58858171 58864865
A2LD1 chr13 - 101182417 101186056
A2LD1 chr13 - 101182417 101241046
A2M chr12 - 9220303 9268558
A2ML1 chr12 + 8975149 9029377

I want to subtract the value in column 5 from column 4 and send the results to a 6th column and the file to look like this (6th col = values in 5th- values in 4th

A1BG chr19 + 58863335 58866549 3214
A1BG chr19 - 58858171 58864865 6694
A2LD1 chr13 - 101182417 101186056 3639
A2LD1 chr13 - 101182417 101241046 58629
A2M chr12 - 9220303 9268558 48255
A2ML1 chr12 + 8975149 9029377 54228

Then I want the file to get rid of the duplicates (I already used the sort and uniq commands and I still have some left e.g A1BG, A2LD1)
I want my final file to look like this, with only the genes with greatest length recorded (as below: only the largest values for genes A1BG and ALD1 were recorded:

A1BG chr19 - 58858171 58864865 6694
A2LD1 chr13 - 101182417 101241046 58629
A2M chr12 - 9220303 9268558 48255
A2ML1 chr12 + 8975149 9029377 54228

Please help!
Thanks so much!

wolf_blue

View Public Profile for wolf_blue

Find all posts by wolf_blue

10-24-2011

Registered User

509, 132

Join Date: Jul 2011

Last Activity: 24 September 2019, 9:48 AM EDT

Location: Chennai, India

Posts: 509

Thanks Given: 16

Thanked 132 Times in 127 Posts

AWK

Hi,

Try this one,

Code:

awk '{a=$5-$4;print $0,a;}' Input_File

For unique lines try this one,

Code:

awk 'BEGIN{re=0;}{if(re != $1 ){a=$5-$4;print $0,a;re=$1;}}' Input_File

Cheers,
Ranga

Last edited by rangarasan; 10-24-2011 at 11:22 AM.. Reason: added unique check

This User Gave Thanks to rangarasan For This Post:

rangarasan

View Public Profile for rangarasan

Find all posts by rangarasan

10-24-2011

Registered User

939, 225

Join Date: Mar 2011

Last Activity: 8 May 2020, 3:48 AM EDT

Location: Éire

Posts: 939

Thanks Given: 27

Thanked 225 Times in 219 Posts

The following script only prints the instance of any duplicate with the highest value of the new final field.

Code:

perl -e 'while(<>){
   chomp;
   @fields=split(/ /,$_);
   if (($fields[4] - $fields[3]) > $genes{$fields[0]}->[5]){
      $genes{$fields[0]}=[@fields[0..4],$fields[4] - $fields[3]];
   }
}
for (sort keys %genes){
   print join (" ", @{$genes{$_}}),"\n";
}' tmp.dat

This User Gave Thanks to Skrynesaver For This Post:

Skrynesaver

View Public Profile for Skrynesaver

Find all posts by Skrynesaver

10-24-2011

Registered User

833, 187

Join Date: Jul 2008

Last Activity: 9 March 2016, 9:36 AM EST

Posts: 833

Thanks Given: 9

Thanked 187 Times in 177 Posts

To fulfill wolf request, modified ranga's code

Code:

$ nawk '{a=$5-$4;print $0,a}' infile | sort +5nr | nawk '!x[$1]++'
A2LD1 chr13 - 101182417 101241046 58629
A2ML1 chr12 + 8975149 9029377 54228
A2M chr12 - 9220303 9268558 48255
A1BG chr19 - 58858171 58864865 6694

This User Gave Thanks to jayan_jay For This Post:

jayan_jay

View Public Profile for jayan_jay

Find all posts by jayan_jay

10-24-2011

Registered User

8, 0

Join Date: Oct 2011

Last Activity: 1 December 2011, 4:02 PM EST

Posts: 8

Thanks Given: 4

Thanked 0 Times in 0 Posts

Sorry to be confused on this one but where do I enter my file name? at the end of:
nawk '{a=$5-$4;print $0,a}' infile | sort +5nr | nawk '!x[$1]++' myfile.txt Am I doing this correctly?
Sorry, I'm really new to Unix.

wolf_blue

View Public Profile for wolf_blue

Find all posts by wolf_blue

10-24-2011

Registered User

2,977, 644

Join Date: Oct 2010

Last Activity: 14 September 2019, 1:15 PM EDT

Location: France

Posts: 2,977

Thanks Given: 88

Thanked 644 Times in 613 Posts

Code:

nawk '{a=$5-$4;print $0,a}' infile | sort +5nr | nawk '!x[$1]++'

Your file is specified in red.

Then the pipe operator "|" will branch the output of the previous command as input of next command.

so the output of

Code:

nawk '{a=$5-$4;print $0,a}' infile

will be given as input of

Code:

sort +5nr

then output of

Code:

nawk '{a=$5-$4;print $0,a}' infile | sort +5nr

will be given as input of

Code:

nawk '!x[$1]++'

Last edited by ctsgnb; 10-24-2011 at 07:08 PM.. Reason: sry wrong window

ctsgnb

View Public Profile for ctsgnb

Find all posts by ctsgnb

UNIX for Dummies Questions & Answers

Subtracting values from 2 columns in a file

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Add values to file in 2 new columns

Discussion started by: jiam912

2. Shell Programming and Scripting

awk - Adding and Subtracting Numbers from 2 Columns

Discussion started by: pshields1984

3. Shell Programming and Scripting

Subtracting values from variable

Discussion started by: sdosanjh

4. Shell Programming and Scripting

Searching columns and subtracting values in awk

Discussion started by: collards

5. UNIX for Dummies Questions & Answers

Removing columns from a text file that do not have any values in second and third columns

Discussion started by: evelibertine

6. Shell Programming and Scripting

Copy values from columns matching in those in second file.

Discussion started by: shoaibjameel123

7. Shell Programming and Scripting

Math operations with file columns values.

Discussion started by: fabian23

8. UNIX for Dummies Questions & Answers

combine the values from the first two columns within a file

Discussion started by: Unilearn

9. Shell Programming and Scripting

Subtracting columns against each other

Discussion started by: Fredrick

10. Shell Programming and Scripting

comparing files - adding/subtracting/formating columns

Discussion started by: oabdalla