awk to count and rename based on fields


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to count and rename based on fields
# 1  
Old 08-26-2016
awk to count and rename based on fields

In the below awk using the tab-delimited input, I am trying count the - symbol in $5 and output the count as well as the renamed condition ins. I am also count the - symbol in $6 and output the count as well as the renamed condition del. I am also count the tomes that in $5 and $6 there are actually letters in both, and output the count as well as the renamed condition snp.

input
Code:
Index    Mutation Call    Start    End    Ref    Alt    Func.refGene    Gene.refGene    ExonicFunc.refGene    Sanger
13    c.[1035-3T>C]+[1035-3T>C]    166170127    166170127    T    C    intronic    SCN2A        
16    c.[2994C>T]+[=]    166210776    166210776    C    T    exonic    SCN2A    synonymous SNV    
19    c.[4914T>A]+[4914T>A]    166245230    166245230    T    A    exonic    SCN2A    synonymous SNV    
20    c.[5109C>T]+[=]    166245425    166245425    C    T    exonic    SCN2A    synonymous SNV    
21    c.[5139C>T]+[=]    166848646    166848646    G    A    exonic    SCN1A    synonymous SNV    
22    c.3152_3153insAACCACT    166892841    166892841    -    AGTGGTT    exonic    SCN1A    frameshift insertion    TP
23    c.2044-5delT    166898947    166898947    A    -    intronic    SCN1A        
25    c.1530_1531insA    166901684    166901684    -    T    exonic    SCN1A    frameshift insertion    FP

current output
Code:
Category  Count
ins       del    2

desired output
Code:
Category  Count
ins           2
del           1
snp           5


Last edited by cmccabe; 08-26-2016 at 02:23 PM.. Reason: fixed format
# 2  
Old 08-26-2016
Post the awk program that you used.
This User Gave Thanks to Yoda For This Post:
# 3  
Old 08-26-2016
sorry:

awk
Code:
awk -F'\t' '$5=="-"{count++}
            $4=="-"{count++} 
                  END{print "Category","Count"; 
                      print "indel",count+0}' input | # replace nulls with zero
  column -t > count # print out tab-delimited


Last edited by cmccabe; 08-26-2016 at 02:30 PM.. Reason: fixed format
# 4  
Old 08-26-2016
Quote:
Originally Posted by cmccabe
sorry:
awk
Code:
awk -F'\t' '$5=="-"{count++}
            $4=="-"{count++} 
                  END{print "Category","Count"; 
                      print "indel",count+0}' input | # replace nulls with zero
  column -t > count # print out tab-delimited

Hello cmccabe,

Sorry to say but I am not able to understad it, following are some questions on this.

i- What you mean here by renamed ins and delhere?
ii- Are you trying to fill any field with above metioned keywords?
iii- I could see string del and ins on 23rd and 25th lines respectively, so is it related to it? Though it is second column where I could see it(considering field seprator is space or tab here).

Request you to please post more meaningful data samples and meaningful output samples too, so that we could try to help you in same.

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 5  
Old 08-26-2016
Code:
awk -F'\t' '$5=="-"{count++} # check for - in $5
              $6=="-"{count++}   # check for - in $6
                  END{print "Category","Count"; # replace null with zero 
                      print "indel",count+0}' out | 
  column -t > count

# print tab-delimited

Quote:
i- What you mean here by renamed ins and del here?
ii- Are you trying to fill any field with above metioned keywords?
iii- I could see string del and ins on 23rd and 25th lines respectively, so is it related to it? Though it is second column where I could see it(considering field seprator is space or tab here).
i- since I am just counting -, I am renaming that based on which field was used
For example, .
if $5 was used to count the -, then the - is renamed or printed as ins
if $6 was used to count the -, then the - is renamed or printed as del
if $5 and $6 had letters in them and were used to count then that is renamed or printed as snp

ii- I am not filling the fields with data, rather using the data already there to output the result.

iii- those keywords are in that field $2 in this example but that is not always the case.

Thank you Smilie.
# 6  
Old 08-26-2016
Try this:-
Code:
awk -F'\t' '
        NR == 1 {
                print "Category", "Count"
                next
        }
        $5 == "-" {
                ++A["ins"]
        }
        $6 == "-" {
                ++A["del"]
        }
        $5 != "-" && $6 != "-" {
                ++A["snp"]
        }
        END {
                for ( k in A )
                        print k, A[k]
        }
' OFS='\t' file

This User Gave Thanks to Yoda For This Post:
# 7  
Old 08-26-2016
Thank you very much Smilie.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to update file based on match in 3 fields

Trying to use awk to store the value of $5 in file1 in array x. That array x is then used to search $4 of file1 to find aa match (I use x to skip the header in file1). Since $4 can have multiple strings in it seperated by a , (comma), I split them and iterate througn each split looking for a match.... (2 Replies)
Discussion started by: cmccabe
2 Replies

2. Shell Programming and Scripting

awk command to search based on 5 user input fields

Field1=”” Field2=”” Field3=”” Field4=”” Field5=”” USER INPUT UP TO 5 FIELDS awk -F , '{ if ( $3 == Field1 && $6 == Field2 && $8 == Field3 && $9 == Field4 && $10 == Field5) print $0 }' /tmp/rodney.outD INPUT FILE (Rodney.outD): ... (3 Replies)
Discussion started by: rmerrird
3 Replies

3. Shell Programming and Scripting

awk to remove lines where field count is greather than 1 in two fields

I am trying to remove all the lines and spaces where the count in $4 or $5 is greater than 1 (more than 1 letter). The file and the output are tab-delimited. Thank you :). file X 5811530 . G C NLGN4X 17 10544696 . GA G MYH3 9 96439004 . C ... (1 Reply)
Discussion started by: cmccabe
1 Replies

4. Shell Programming and Scripting

awk to output match and mismatch with count using specific fields

In the below awk I am trying output to one file those lines that match between $2,$3,$4 of file1 and file2 with the count in (). I am also trying to output those lines that are missing between $2,$3,$4 of file1 and file2 with the count of in () each. Both input files are tab-delimited, but the... (7 Replies)
Discussion started by: cmccabe
7 Replies

5. Shell Programming and Scripting

awk - split data based on the count

Greetings Experts, I am generating a validation query through awk and facing an issue, which I need to overcome by splitting the data based on the pattern matching count in the value of an array. File1 -- Table11@column1@date@Table21@column1@varchar(10)@d;... (4 Replies)
Discussion started by: chill3chee
4 Replies

6. Shell Programming and Scripting

Awk: Combine multiple lines based on number of fields

If a file has following kind of data, comma delimited 1,2,3,4 1 1 1,2,3,4 1,2 2 2,3,4 My required output must have only 4 columns with comma delimited 1,2,3,4 111,2,3,4 1,222,3,4 I have tried many awk command using ORS="" but couldnt progress (10 Replies)
Discussion started by: mdkm
10 Replies

7. Shell Programming and Scripting

awk sort based on difference of fields and print all fields

Hi I have a file as below <field1> <field2> <field3> ... <field_num1> <field_num2> Trying to sort based on difference of <field_num1> and <field_num2> in desceding order and print all fields. I tried this and it doesn't sort on the difference field .. Appreciate your help. cat... (9 Replies)
Discussion started by: newstart
9 Replies

8. Shell Programming and Scripting

awk count fields not working

Hi, i am trying to count the fields in a file. Input: 100,1000,,2000,3000,10/26/2012 12:12:30 200,3000,,1000,01/28/2012 17:12:30 300,5000,,5000,7000,09/06/2012 16:12:30 output: Cout of the fileds for each row 6 5 6 awk -F"," '{print $NF}' file1.txt When i try with above awk... (3 Replies)
Discussion started by: onesuri
3 Replies

9. Shell Programming and Scripting

awk - count character count of fields

Hello All, I got a requirement when I was working with a file. Say the file has unloads of data from a table in the form 1|121|asda|434|thesi|2012|05|24| 1|343|unit|09|best|2012|11|5| I was put into a scenario where I need the field count in all the lines in that file. It was simply... (6 Replies)
Discussion started by: PikK45
6 Replies

10. UNIX for Dummies Questions & Answers

count number of fields not using SED or AWK

hi forums i need help with a little problem i am having. i need to count the number of fields that are in a saved variable so i can use that number to make a different function work properly. is there a way of doing this without using SED/AWK? anything would be greatly appreciated (4 Replies)
Discussion started by: strasner
4 Replies
Login or Register to Ask a Question