Sponsored Content
Top Forums Shell Programming and Scripting awk to combine matches and use a field to adjust coordinates in other fields Post 302977602 by cmccabe on Wednesday 20th of July 2016 10:08:03 AM
Old 07-20-2016
awk to combine matches and use a field to adjust coordinates in other fields

Trying to output a result that uses the data from file to combine and subtract specific lines. If $4 matches in each line then the last $6 value is added to $2 and that becomes the new$3. Each matching line in combined into one with $1 then the original $2 then the new$3 then $5. For the cases where there is only a sing value in $6 that matches a line, then if that value is 1, then the original $2 is the new $3 in the result. If the value in $6 that matches a line, then if that value is anything but 1, then the digit is added to the original $2 and the new $3 in the result. I hope this is possibe. Thanks Smilie.

file
Code:
chrX    110961329    110961512    chrX:110961329-110961512    ALG13    1    7
chrX    110961329    110961512    chrX:110961329-110961512    ALG13    2    7
chrX    110961329    110961512    chrX:110961329-110961512    ALG13    3    7
chrX    110961329    110961512    chrX:110961329-110961512    ALG13    4    5
chrX    110961329    110961512    chrX:110961329-110961512    ALG13    5    4
chr2    50573818    50574097    chr2:50573818-50574097    NRXN1    268    9
chr2    50573818    50574097    chr2:50573818-50574097    NRXN1    269    8
chr2    50573818    50574097    chr2:50573818-50574097    NRXN1    270    7
chr2    50573818    50574097    chr2:50573818-50574097    NRXN1    271    7
chrX    135080256    135080354    chrX:135080256-135080354    SLC9A6    1    16
chr18    53298518    53298629    chr18:53298518-53298629    TCF4    11    1

desired output result
Code:
chrX    110961329    110961334    ALG13
chr2    50573818    50573822    NRXN1
chrX    135080256    135080256    SLC9A6
chr18    53298529    53298529    TCF4

Currently, I use

Code:
awk 'BEGIN {OFS="\t"}; {print $1,$2,$3,$5}' file | sort -u > result

but that only sorts by the unique entries and gives misleading results. Thanks.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to sum specific field when pattern matches

Trying to sum field #6 when field #2 matches string as follows: Input data: 2010-09-18-20.24.44.206117 UOWEXEC db2bp DB2XYZ hostname 1 2010-09-18-20.24.44.206117 UOWWAIT db2bp DB2XYZ hostname ... (3 Replies)
Discussion started by: ux4me
3 Replies

2. Shell Programming and Scripting

How to print 1st field and last 2 fields together and the rest of the fields after it using awk?

Hi experts, I need to print the first field first then last two fields should come next and then i need to print rest of the fields. Input : a1,abc,jsd,fhf,fkk,b1,b2 a2,acb,dfg,ghj,b3,c4 a3,djf,wdjg,fkg,dff,ggk,d4,d5 Expected output: a1,b1,b2,abc,jsd,fhf,fkk... (6 Replies)
Discussion started by: 100bees
6 Replies

3. Shell Programming and Scripting

awk to match keyword and return matches and unique fields

Trying to use awk to find a keyword and return the matches in the row, but also $1 and $2, which are the unique id's, but they only appear once. Thank you :). file name 31 Index Chromosomal Position Gene Inheritance 122 2106725 TSC2 AD 124 2115481 TSC2 AD 121 2105400 TSC2 AD... (6 Replies)
Discussion started by: cmccabe
6 Replies

4. Shell Programming and Scripting

awk to combine by field and average by another

In the below awk I am trying to combine all matching $4 into a single $5 (up to the -), and count the lines in $6 and average all values in $7. The awk is close but it seems to only be using the last line in the file and skipping all others. The posted input is a sample of the file that is over... (3 Replies)
Discussion started by: cmccabe
3 Replies

5. Shell Programming and Scripting

awk to combine all matching fields in input but only print line with largest value in specific field

In the below I am trying to use awk to match all the $13 values in input, which is tab-delimited, that are in $1 of gene which is just a single column of text. However only the line with the greatest $9 value in input needs to be printed. So in the example below all the MECP2 and LTBP1... (0 Replies)
Discussion started by: cmccabe
0 Replies

6. Shell Programming and Scripting

awk to adjust coordinates in field based on sequential numbers in another field

I am trying to output a tab-delimited result that uses the data from a tab-delimited file to combine and subtract specific lines. If $4 matches in each line then the first matching sequential $6 value is added to $2, unless the value is 1, then the original $2 is used (like in the case of line... (3 Replies)
Discussion started by: cmccabe
3 Replies

7. Shell Programming and Scripting

awk to format file and combine two fields using comma

I am trying to use awk to format the file below, which is tab-delimited. The desired out is space delimited and is in the order of $9 $13 $2 $10-$11.$10 and $11 are often times multiple values separated by a comma, so the value in $10 is combined with the first value from $11 using the comma.... (5 Replies)
Discussion started by: cmccabe
5 Replies

8. Shell Programming and Scripting

awk to adjust text and count based on value in field

The below awk executes as is and produces the current output. It isvery close but what Ican not seem to do is add the -exon..., the ... portion comes from $1 and the _exon is static and will never change. If there is + sign in $4 then the ... is in acending order or sequential. If there is a - in... (2 Replies)
Discussion started by: cmccabe
2 Replies

9. UNIX for Beginners Questions & Answers

find pattern matches in consecutive lines in certain fields-awk

I have a text file with many thousands of lines, a small sample of which looks like this: InputFile:PS002,003 D -1 5 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 6 6 -1 -1 -1 -1 0 509 0 PS002,003 PSQ 0 1 7 18 1 0 -1 1 1 3 -1 -1 ... (5 Replies)
Discussion started by: jvoot
5 Replies

10. Shell Programming and Scripting

Perl to adjust coordinates based on repeat string

In the file below I am trying to count the given repeats of A,T,C,G in each string of letters. Each sequence is below the > and it is possible for a string of repeats to wrap from the line above. For example, in the first line the last letter is a T and the next lines has 3 more. I think the below... (10 Replies)
Discussion started by: cmccabe
10 Replies
All times are GMT -4. The time now is 03:06 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy