Sponsored Content
Top Forums Shell Programming and Scripting Scripting help to identify words count in lines Post 302572549 by Giorgio C on Thursday 10th of November 2011 09:32:50 AM
Old 11-10-2011
Scripting help to identify words count in lines

Hi everybody,

i have this biological situation to fix:


Code:
> Id.1
ACGTACANNNNNNNNNNNACGTGCNNNNNNNACTGTGGT
>Id.2
ACGGGT
>Id.3
ACGTNNNNNNNNNNNNACTGGGGG
>Id.4
ACGTGCGNNNNNNNNGGTCANNNNNNNNCGTGCAAANNNNN
........
....

These are nucleotidic sequences with some "NNNN..." always of the same length but in different positions.(we have about 300.000 >Id different sequences). The "NNNN..." may occur one time,two times, 3 or max 4 time (or 0). My question is:

Is there anyway to coount how many >Id occur with one "NNNN.." how many reads 2,3,4 and 0 over the total 300.000?

I mean something that at the end would be for exemple from 300.000 >Ids

Code:
100.000 have one "NNNN..."
200.000 have two "NNNN..."
50.000 have three "NNNN..."
30.000 have four   "NNNN.."
20.000 don't have any "NNNN.."

The lines are always the same type

>Id..
letter......with or without a variable number of block of "NNNN..." .

(In the "NNNN..." block the number of N is always the same, they are adaptors in every lines across the other letter A,C,G,T)

I hope to have been clear and that anyone can help me...

Please...!!! Smilie
Moderator's Comments:
Mod Comment
Please use code tags when posting data and code samples!

Last edited by vgersh99; 11-10-2011 at 10:43 AM.. Reason: code tags, please!
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Identify duplicate words in a line using command

Hi, Let me explain the problem clearly: Let the entries in my file be: lion,tiger,bear apple,mango,orange,apple,grape unix,windows,solaris,windows,linux red,blue,green,yellow orange,maroon,pink,violet,orange,pink Can we detect the lines in which one of the words(separated by field... (8 Replies)
Discussion started by: srinivasan_85
8 Replies

2. Shell Programming and Scripting

awk help needed in trying to count lines,words and characters

Hello, i am trying to write a script file in awk which yields me the number of lines,characters and words, i checked it many many times but i am not able to find any mistake in it. Please tell me where i went wrong. BEGIN{ print "Filename Lines Words Chars\n" } { filename=filename + 1... (2 Replies)
Discussion started by: salman4u
2 Replies

3. Shell Programming and Scripting

Count the no of lines between two words

Please help in the following problem: Input is: Pritam 123 456 Patil myname youname Pritam myproject thisproject iclic Patil remaining text some more text I need the command which will display the no of lines between two words in the whole file. e.g. Display all the no of lines... (5 Replies)
Discussion started by: zsudarshan
5 Replies

4. Shell Programming and Scripting

Count words

Hi, does anyone know the command to count words by its length. I need to Count the number of five letter words that I have in a file with thousand of words. thanks (3 Replies)
Discussion started by: fabioamaury
3 Replies

5. Shell Programming and Scripting

Shell script to find out words, replace them and count words

hello, i 'd like your help about a bash script which: 1. finds inside the html file (it is attached with my post) the code number of the Latest Stable Kernel, 2.finds the link which leads to the download location of the Latest Stable Kernel version, (the right link should lead to the file... (3 Replies)
Discussion started by: alex83
3 Replies

6. Shell Programming and Scripting

Count lines and words of a stream output with tail

Hello, I need to tail -f a file output stream and I need to get only lines that contains "get" and "point" in the same line. It doesn't matter the order. Then I need only the text BEFORE "point". I have to count each line and perform other serveral actions after this has performed 3 times.... (9 Replies)
Discussion started by: Kibou
9 Replies

7. Shell Programming and Scripting

How count the number of two words associated with the two words occurring in the file?

Hi , I need to count the number of errors associated with the two words occurring in the file. It's about counting the occurrences of the word "error" for where is the word "index.js". As such the command should look like. Please kindly help. I was trying: grep "error" log.txt | wc -l (1 Reply)
Discussion started by: jmarx
1 Replies

8. Shell Programming and Scripting

Count words from file

hi all how to count words from a text aaa bbb ccc ddd 123 aaa 123 aaa aaa ddd 123 i need to cout hoe many time the words "aaa" and "123" each appears the output should be 4 3 or 4 3 or aaa 4 123 3 thanks (10 Replies)
Discussion started by: sharong
10 Replies

9. Shell Programming and Scripting

Count words/lines between two tags using awk

Is there an efficient awk that can count the number of lines that occur in between two tags. For instance, consider the following text: <s> Hi PP - my VBD - name DT - is NN - . SENT . </s> <s> Her PP - name VBD - is DT - the NN - same WRT - . SENT - </s> I am interested to know... (4 Replies)
Discussion started by: owwow14
4 Replies

10. Shell Programming and Scripting

Regex to identify unique words in a dictionary database

Hello, I have a dictionary which I am building for the Open Source Community. The data structure is as under HEADWORD=PARTOFSPEECH=ENGLISH MEANING as shown in the example below अ=m=Prefix signifying negation. अँहँ=ind=Interjection expressing disapprobation. अं=int=An interjection... (2 Replies)
Discussion started by: gimley
2 Replies
COMPILE_LINK(1p)					User Contributed Perl Documentation					  COMPILE_LINK(1p)

NAME
compile_link -- compile and copy Interchange link CGI VERSION
1.0 SYNOPSIS
compile_link [-p NNNN] [-s sfile] [-h host] [-w N] \ [--perl] [-nf] [-o outputfile] [-b dir] [-s dir] DESCRIPTION
The "compile_link" program configures (including compilation if necessary) a link CGI for talking to the Interchange server daemon. If the --perl option is given, it will not compile but instead use the tlink.pl program, setting its variables as needed. Designed to be used in conjunction with Interchange's makecat. OPTIONS
-b dir, --build=dir Sets the directory where the build files will be made. Default is "src" in the Interchange software directory. -h hostname, --host=hostname Sets the host address or host name that should be compiled into the TCP-based link program. This sets the default, which still can be overridden by "MINIVEND_HOST" in the environment of the executing process. -p NNNN, --port=NNNN Sets the port number that should be compiled into the TCP-based link program. This sets the default, which still can be overridden by "MINIVEND_PORT" in the environment of the executing process. The port must be higher than 1024. -s sfile, --socket=sfile The name of the UNIX-domain socket file which should be compiled into the UNIX-domain link program. This sets the default, which still can be overridden by "MINIVEND_SOCKET" in the environment of the executing process. --source=dir Sets the directory containing the source files. Default is "src" in the Interchange software directory. -w N, --timeout=N The number of seconds the link program should wait for a connection before sending its timeout page. SEE ALSO
makecat(1), http://www.icdevgroup.org/ AUTHOR
Mike Heins perl v5.14.2 2012-01-23 COMPILE_LINK(1p)
All times are GMT -4. The time now is 01:21 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy