I do have another problem as well, which occurred after seeing this output.
This could be an entirely different question.
I have the same formated file as above but now with 4 positions. For each position there are the chances of being 2 types of character, either the 1 type character or 2 type character. For example at position 1, characters should be either T (for 1 type) or C (for 2 type), similiarily for position 2, C (for 1 type) or T (2 type), position 3, A(for 1 type)or G (for 2 type) and position 4, T (for 1 type) or C (for 2 type).
below is the input file
Code:
>A1
TCAT
>A2
CTGC
>A3
TCGC
>A4
TTAT
>A5
TTTT
Based on this, I want to characterize all the sub-headers (>A1, A2, A3, A4, A5) in the above file so that I would know which type it is.
the desired output ( No need for the part after #, it is just to make it clearer)
Code:
PLease let me know the way to do it in awk
A1 1 #all type 1 characters
A2 2 # all type 2 characters
A3 mixed # contains at least one type 1 or type 2 characters in any of the 4 positions
A4 mixed # contains at least one type 1 or type 2 characters in any of the 4 positions
A5 NA #if any of the positions have any other character other than type 1 or type
If you really wanted you could embed it into awk itself like
Code:
T[1,"T"]=1;
T[1,"C"]=2;
...
in the BELOW section instead, but when there's more than three lines of it, I tend to put that in files. Just better organization, and far less chance of typoes than doing fiddly [] operations over and over.
Hi All, here's a question from newbie
I have a data like this, which set of small DNA sequences separated by new line
GAATCCGGAAACAGCAACTTCAAANCA
GTNATTCGGGCCAAACTGTCGAA
TTNGGCAACTGTTAGAGCTCATGCGACA
CCTGCTAAACGAGTTCGAGTTGAANGA
TTNCGGAAGTGGTCGCTGGCACGG
ACNTGCATGTACGGAGTGACGAAACCI... (6 Replies)
Ok say I wanted to count every Y in a data file.
Then set Y as my delimiter so that I can separate my file by taking all the contents that occur BEFORE the first Y and store them in a variable so that I may use this content later on in my program. Then I could do the same thing with the next Y's... (5 Replies)
Hi All,
I need some help in counting the number of letters in a big file with separations.
Following is the file I have
>AB_1
MLKKPIIIGVTGGSGGGKTSVSRAILDSFPNARIAMIQHDSYYKDQSHMSFEERVKTNYDHPLAFDTDFM
IQQLKELLAGRPVDIPIYDYKKHTRSNTTFRQDPQDVIIVEGILVLEDERLRDLMDIKLFVDTDDDIRII... (6 Replies)
I want to list the occurence of particular characters in a line. my file looks like this
a,b,c,d
e,f,g
h,y:e,g,y s
f;g,s,w
and I want to count how many commas are in each line so the file in the end looks like this:
a,b,c,d 3
e,f,g 2
h,y:e,g,y s 3
f;g,s,w ... (2 Replies)
I have a comma delimited file that roughly has 300 fields. Not all fields are populated.
This file is fed into another system, what I need to do is count the amount of characters in each field and give me an output similiar to this:
1 - 6,2 - 25
The first number is the field and the second... (2 Replies)
Dears,
I would like to count the number of "(" and ")" that occur in a file.
(syntax checking script). I tried to use "grep -c" and this works fine as long as there is only one character (for which I do a search) on a line.
Has anyone an idea how I can count the number of specific characters... (6 Replies)