Help with calculating frequency of specific word in a string


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help with calculating frequency of specific word in a string
# 1  
Old 12-03-2011
Help with calculating frequency of specific word in a string

Input file:
Code:
#read_1
AWEAWQQRZZZQWQQWZ
#read_2
ZZAQWRQTWQQQWADSADZZZ
#read_3
POGZZZZZZADWRR
.
.

Desired output file:
Code:
#read_1 3
#read_1 1
#read_2 2
#read_2 3
#read_3 6
.
.

Perl script that I have tried:
Code:
#!/usr/bin/perl 

$/ = ">";


while (<>) {
	next if $. ==  1;
	chomp;

	my($header,@other) =  split(/\n/,$_);
	$sequence = join"",@other;

	my @letters = split"",$sequence;
	$seqlength = length $sequence;
	$counter = 0;

	foreach $base (@letters) {
		$counter++ if $base eq 'Z';
	}
	print ">$header\t$counter\n";	
	
}

Command I have tried:
Code:
[home@user]perl count.pl input_file.txt > input_file.stats
[home@user]cat input_file.stats
#read_1 4
#read_2 5
#read_3 6
.
.

My purpose is to calculate the frequency of "Z" at each string in detail.
However, I only able to total sum all the frequency of "Z" in each string.

Thanks for any advice.
# 2  
Old 12-03-2011
Code:
perl -l -0043 -ne '/(.*)\n(.*)/;$h=$1;$s=$2;while($s=~/Z+/g){print "#$h " . length $&}' input_file.txt


Last edited by bartus11; 12-03-2011 at 06:42 AM..
This User Gave Thanks to bartus11 For This Post:
# 3  
Old 12-04-2011
Thanks, bartus11.
Your perl script worked perfectly.
Do you mind to explain what is the meaning of "-l -0043" and "/(.*)\n(.*)/;" at the beginning of your perl script?
Many thanks for advice.
# 4  
Old 12-04-2011
From http://perldoc.perl.org/perlrun.html:
Code:
-0[octal/hexadecimal] 

specifies the input record separator ($/ ) as an octal or hexadecimal number.

Octal value of the ASCII code for "#" is "043", so now "#" is specifying record boundaries, not newlines. Now to your second question:
Code:
/(.*)\n(.*)/

In that new record whatever is before a newline is matched by red part in the regex, so the header (read_...) goes there. What is after newline is matched by blue part, so the line with Zs goes there.
This User Gave Thanks to bartus11 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Get string before specific word in UNIX

Hi All, I'm writing unix shell script and I have these files. I need to get name before _DETL.tmp. ABC_AAA_DETL.tmp ABC_BBB_DETL.tmp ABC_CCC_DETL.tmp PQR_DETL.tmp DEF_DETL.tmp JKL_DETL.tmp YUI_DETL.tmp TG_NM_DDD_DETL.tmp TG_NM_EEE_DETL.tmp GHJ_DETL.tmp RTY_DETL.tmp output will... (3 Replies)
Discussion started by: ace_friends22
3 Replies

2. Shell Programming and Scripting

Shell scripting: frequency of specific word in a string and statistics

Hello friends, I need a BIG help from UNIX collective intelligence: I have a CSV file like this: VALUE,TIMESTAMP,TEXT 1,Sun May 05 16:13:05 +0000 2013,"RT @gracecheree: Praying God sends me a really great man one day. Gotta trust in his timing. 0,Sun May 05 16:13:05 +0000 2013,@sendi__... (19 Replies)
Discussion started by: kraterions
19 Replies

3. UNIX for Dummies Questions & Answers

Calculating cumulative frequency using awk

Hi, I wanted to calculate cumulative frequency distribution of my data that involves several arithmetic calls. I did things in excel but its taking me forever. this is what I want to do: var1.txt contains n observations which I have to compute for frequency which is given by 1/n and subsequently... (7 Replies)
Discussion started by: ida1215
7 Replies

4. Shell Programming and Scripting

Calculating frequency of values within bins

Hi, I am working with files containing 2 columns in which i need to come up with the frequency/count of values in col. 2 falling within specifics binned values of col. 1. the contents of a sample file is shown below: 15 12.5 15 11.2 16 0.2 16 1.4 17 1.6 18 4.5 17 5.6 12 8.6 11 7.2 9 ... (13 Replies)
Discussion started by: ida1215
13 Replies

5. Shell Programming and Scripting

break the string and print it in a new line after a specific word

Hi Gurus I am new to this forum.. I am using HP Unix OS. I have one single string in input file as shown below Abc123 | cde | fgh | ghik| lmno | Abc456 |one |two |three | four | Abc789 | five | Six | seven | eight | Abc098 | ........ I want to achive the result in a output file as shown... (3 Replies)
Discussion started by: kannansr621
3 Replies

6. Shell Programming and Scripting

Parse a String for a Specific Word

Hello, I'm almost there with scripting, and I've looked at a few examples that could help me out here. But I'm still at a lost where to start. I'm looking to parse each line in the log file below and save the output like below. Log File AABBCGCAT022|242|3 AABBCGCAT023|243|4... (6 Replies)
Discussion started by: ravzter
6 Replies

7. Shell Programming and Scripting

search-word-print-specific-string

Hi, Our input xml looks like: <doc> <str name="account_id">1111</str> <str name="prd_id">DHEP155EK</str> </doc> - <doc> <str name="account_id">6666</str> <str name="prd_id">394531662</str> </doc> - <doc> <str name="account_id">6666</str> <str... (1 Reply)
Discussion started by: Jassz
1 Replies

8. Shell Programming and Scripting

Calculating cumulative frequency

Hi, I have a file containing the frequency's of an element sorted in ascending order. The file looks something like this: #Element Frequency 1 1 2 1 3 1 4 1 5 1 6 ... (5 Replies)
Discussion started by: sajal.bhatia
5 Replies

9. Shell Programming and Scripting

Finding a word at specific location in a string

Hi All , I have different strings (SQL queries infact) of different lengths such as: 1. "SELECT XYZ FROM ABC WHERE ABC.DEF='123' " 2. "DELETE FROM ABC WHERE ABC.DEF='567'" 3. "SELECT * FROM ABC" I need to find out the word coming after the... (1 Reply)
Discussion started by: swapnil.nawale
1 Replies

10. Shell Programming and Scripting

Determining Word Frequency of Specific Terms

Hello, I require a perl script that will read a .txt file that contains words like 224.199.207.IN-ADDR.ARPA. IN NS NS1.internet.com. 4.200.162.207.in-addr.arpa. IN PTR beeriftw.internet.com. arroyoeinternet.com. IN A 200.199.227.49 I want to focus on words: IN... (23 Replies)
Discussion started by: richsark
23 Replies
Login or Register to Ask a Question