Scripting help to identify words count in lines


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Scripting help to identify words count in lines
# 1  
Old 11-10-2011
Scripting help to identify words count in lines

Hi everybody,

i have this biological situation to fix:


Code:
> Id.1
ACGTACANNNNNNNNNNNACGTGCNNNNNNNACTGTGGT
>Id.2
ACGGGT
>Id.3
ACGTNNNNNNNNNNNNACTGGGGG
>Id.4
ACGTGCGNNNNNNNNGGTCANNNNNNNNCGTGCAAANNNNN
........
....

These are nucleotidic sequences with some "NNNN..." always of the same length but in different positions.(we have about 300.000 >Id different sequences). The "NNNN..." may occur one time,two times, 3 or max 4 time (or 0). My question is:

Is there anyway to coount how many >Id occur with one "NNNN.." how many reads 2,3,4 and 0 over the total 300.000?

I mean something that at the end would be for exemple from 300.000 >Ids

Code:
100.000 have one "NNNN..."
200.000 have two "NNNN..."
50.000 have three "NNNN..."
30.000 have four   "NNNN.."
20.000 don't have any "NNNN.."

The lines are always the same type

>Id..
letter......with or without a variable number of block of "NNNN..." .

(In the "NNNN..." block the number of N is always the same, they are adaptors in every lines across the other letter A,C,G,T)

I hope to have been clear and that anyone can help me...

Please...!!! Smilie
Moderator's Comments:
Mod Comment
Please use code tags when posting data and code samples!

Last edited by vgersh99; 11-10-2011 at 10:43 AM.. Reason: code tags, please!
# 2  
Old 11-10-2011
Code:
nawk -F'[N]+' '/^[^>]/{a[NF-1]++}END{for(i in a) print a[i] " have " i " NNs"}' myFile

# 3  
Old 11-10-2011
Amazing !!! The best answer....you'r great !!! Thank yuo very much for your help.............It works !!!
# 4  
Old 11-10-2011
Try this for start:
Code:
perl -ne '$count{s/N+//g}++ if /^[^>]/;END{for $i (keys %count){print "$count{$i} have $i NNNNN...\n";}}' file

# 5  
Old 11-10-2011
Thanks Bartus !!! The same result and speed !!! Very good in perl !!!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Regex to identify unique words in a dictionary database

Hello, I have a dictionary which I am building for the Open Source Community. The data structure is as under HEADWORD=PARTOFSPEECH=ENGLISH MEANING as shown in the example below अ=m=Prefix signifying negation. अँहँ=ind=Interjection expressing disapprobation. अं=int=An interjection... (2 Replies)
Discussion started by: gimley
2 Replies

2. Shell Programming and Scripting

Count words/lines between two tags using awk

Is there an efficient awk that can count the number of lines that occur in between two tags. For instance, consider the following text: <s> Hi PP - my VBD - name DT - is NN - . SENT . </s> <s> Her PP - name VBD - is DT - the NN - same WRT - . SENT - </s> I am interested to know... (4 Replies)
Discussion started by: owwow14
4 Replies

3. Shell Programming and Scripting

Count words from file

hi all how to count words from a text aaa bbb ccc ddd 123 aaa 123 aaa aaa ddd 123 i need to cout hoe many time the words "aaa" and "123" each appears the output should be 4 3 or 4 3 or aaa 4 123 3 thanks (10 Replies)
Discussion started by: sharong
10 Replies

4. Shell Programming and Scripting

How count the number of two words associated with the two words occurring in the file?

Hi , I need to count the number of errors associated with the two words occurring in the file. It's about counting the occurrences of the word "error" for where is the word "index.js". As such the command should look like. Please kindly help. I was trying: grep "error" log.txt | wc -l (1 Reply)
Discussion started by: jmarx
1 Replies

5. Shell Programming and Scripting

Count lines and words of a stream output with tail

Hello, I need to tail -f a file output stream and I need to get only lines that contains "get" and "point" in the same line. It doesn't matter the order. Then I need only the text BEFORE "point". I have to count each line and perform other serveral actions after this has performed 3 times.... (9 Replies)
Discussion started by: Kibou
9 Replies

6. Shell Programming and Scripting

Shell script to find out words, replace them and count words

hello, i 'd like your help about a bash script which: 1. finds inside the html file (it is attached with my post) the code number of the Latest Stable Kernel, 2.finds the link which leads to the download location of the Latest Stable Kernel version, (the right link should lead to the file... (3 Replies)
Discussion started by: alex83
3 Replies

7. Shell Programming and Scripting

Count words

Hi, does anyone know the command to count words by its length. I need to Count the number of five letter words that I have in a file with thousand of words. thanks (3 Replies)
Discussion started by: fabioamaury
3 Replies

8. Shell Programming and Scripting

Count the no of lines between two words

Please help in the following problem: Input is: Pritam 123 456 Patil myname youname Pritam myproject thisproject iclic Patil remaining text some more text I need the command which will display the no of lines between two words in the whole file. e.g. Display all the no of lines... (5 Replies)
Discussion started by: zsudarshan
5 Replies

9. Shell Programming and Scripting

awk help needed in trying to count lines,words and characters

Hello, i am trying to write a script file in awk which yields me the number of lines,characters and words, i checked it many many times but i am not able to find any mistake in it. Please tell me where i went wrong. BEGIN{ print "Filename Lines Words Chars\n" } { filename=filename + 1... (2 Replies)
Discussion started by: salman4u
2 Replies

10. UNIX for Dummies Questions & Answers

Identify duplicate words in a line using command

Hi, Let me explain the problem clearly: Let the entries in my file be: lion,tiger,bear apple,mango,orange,apple,grape unix,windows,solaris,windows,linux red,blue,green,yellow orange,maroon,pink,violet,orange,pink Can we detect the lines in which one of the words(separated by field... (8 Replies)
Discussion started by: srinivasan_85
8 Replies
Login or Register to Ask a Question