Hello,
I have a file with 5 columns that looks like this:
A1BG chr19 + 58863335 58866549
A1BG chr19 - 58858171 58864865
A2LD1 chr13 - 101182417 101186056
A2LD1 chr13 - 101182417 101241046
A2M chr12 - 9220303 9268558
A2ML1 chr12 + 8975149 9029377
I want to subtract the value in column 5 from column 4 and send the results to a 6th column and the file to look like this (6th col = values in 5th- values in 4th
A1BG chr19 + 58863335 58866549 3214
A1BG chr19 - 58858171 58864865 6694
A2LD1 chr13 - 101182417 101186056 3639
A2LD1 chr13 - 101182417 101241046 58629
A2M chr12 - 9220303 9268558 48255
A2ML1 chr12 + 8975149 9029377 54228
Then I want the file to get rid of the duplicates (I already used the sort and uniq commands and I still have some left e.g A1BG, A2LD1)
I want my final file to look like this, with only the genes with greatest length recorded (as below: only the largest values for genes A1BG and ALD1 were recorded:
A1BG chr19 - 58858171 58864865 6694
A2LD1 chr13 - 101182417 101241046 58629
A2M chr12 - 9220303 9268558 48255
A2ML1 chr12 + 8975149 9029377 54228
Please help!
Thanks so much!