Perl- Finding average "frequency" of occurrence of duplicate lines Post: 302545611

Sponsored Content

Top Forums Shell Programming and Scripting Perl- Finding average "frequency" of occurrence of duplicate lines Post 302545611 by acsg on Tuesday 9th of August 2011 02:17:10 AM

08-09-2011

Registered User

Quote:

Originally Posted by yazu

I believe it is possible. But I'm not sure I understand the task (sorry, English is not my native language). Please give examples of your input and the desired output. Maybe it would be enough if you give the desired output for my INPUTFILE:
All lines: 9
Lines between a: 1, 2, 0 (or maybe you need to remember line numbers - 1, 3, 6, 7?) so what output?
b: 2 - ?
c: ? (only one occurrence) - ?
d: 0 - ?

Thanks for your reply.
Yeah what I want is something like what you said. So, for your example input file, the output would be:

Code:

a- 4 2 
b- 2 3
c- 1 0
d- 2 1

the first field being the contents of the line being repeated, the second field the number of times found in the file, the third field being the average of "every how many lines it is repeated". So for example for 'a', first it appears after 2 lines, then 3 lines then 1 line. So the average of this makes 2 lines. Then for 'b' and 'd' since they are only duplicated once, there won't be a need to make an average. And, since 'c' is never repeated, then the average is just '0' (or could be blank, it doesn't matter).

On the other hand, how about keeping track of the timestamp and subtracting it to make the "time between repetitions" and then making an average? That was my original idea but I don't know how to keep track of this time, per each repeated line. The output in this case would be something like:

Code:

a- 4 0.05
b- 2 0.89
c- 1 0
d- 2 0.06

the last field being the seconds.

Thanks!

acsg

View Public Profile for acsg

Find all posts by acsg

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Explain the line "mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'`"

Hi Friends, Can any of you explain me about the below line of code? mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'` Im not able to understand, what exactly it is doing :confused: Any help would be useful for me. Lokesha

2. Shell Programming and Scripting

awk command to replace ";" with "|" and ""|" at diferent places in line of file

Hi, I have line in input file as below: 3G_CENTRAL;INDONESIA_(M)_TELKOMSEL;SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL My expected output for line in the file must be : "1-Radon1-cMOC_deg"|"LDIndex"|"3G_CENTRAL|INDONESIA_(M)_TELKOMSEL"|LAST|"SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL" Can someone...

3. Solaris

"Load Average" vs "virtual processor"

Hi, I have one question regarding the understanding of �load average� in a platform with virtual processors. Suppose in this situation: Total number of physical processors: 1 Number of virtual processors: 32 Total number of cores: 4 Number of cores per physical...

4. Shell Programming and Scripting

finding the strings beween 2 characters "/" & "/" in .txt file

Hi all. I have a .txt file that I need to sort it My file is like: 1- 88 chain0 MASTER (FF-TE) FFFF 1962510 /TCK T FD2TQHVTT1 /jtagc/jtag_instreg/updateinstr_reg_1 dff1 (TI,SO) 2- ...

5. Shell Programming and Scripting

Cant get awk 1liner to remove duplicate lines from Delimited file, get "event not found" error..help

Hi, I am on a Solaris8 machine If someone can help me with adjusting this awk 1 liner (turning it into a real awkscript) to get by this "event not found error" ...or Present Perl solution code that works for Perl5.8 in the csh shell ...that would be great. ****************** ...

6. Shell Programming and Scripting

Find lines with "A" then change "E" to "X" same line

I have a bunch of random character lines like ABCEDFG. I want to find all lines with "A" and then change any "E" to "X" in the same line. ALL lines with "A" will have an "X" somewhere in it. I have tried sed awk and vi editor. I get close, not quite there. I know someone has already solved this...

7. UNIX for Dummies Questions & Answers

Using "mailx" command to read "to" and "cc" email addreses from input file

How to use "mailx" command to do e-mail reading the input file containing email address, where column 1 has name and column 2 containing �To� e-mail address and column 3 contains �cc� e-mail address to include with same email. Sample input file, email.txt Below is an sample code where...

8. Shell Programming and Scripting

Bash script - Print an ascii file using specific font "Latin Modern Mono 12" "regular" "9"

Hello. System : opensuse leap 42.3 I have a bash script that build a text file. I would like the last command doing : print_cmd -o page-left=43 -o page-right=22 -o page-top=28 -o page-bottom=43 -o font=LatinModernMono12:regular:9 some_file.txt where : print_cmd ::= some printing...

9. AIX

Apache 2.4 directory cannot display "Last modified" "Size" "Description"

Hi 2 all, i have had AIX 7.2 :/# /usr/IBMAHS/bin/apachectl -v Server version: Apache/2.4.12 (Unix) Server built: May 25 2015 04:58:27 :/#:/# /usr/IBMAHS/bin/apachectl -M Loaded Modules: core_module (static) so_module (static) http_module (static) mpm_worker_module (static) ...

LEARN ABOUT DEBIAN

bp_oligo_count

BP_OLIGO_COUNT(1p)					User Contributed Perl Documentation					BP_OLIGO_COUNT(1p)

NAME

       oligo_count - oligo count and frequency

SYNOPSIS

	 Usage:  oligo_count [-h/--help] [-l/--length OLIGOLENGTH]
		 [-f/--format SEQFORMAT] [-i/--in/-s/--sequence SEQFILE]
		 [-o/--out OUTFILE]

DESCRIPTION

       This scripts counts occurrence and frequency for all oligonucleotides of given length.

       It can be used to determine what primers are useful for frequent priming of nucleic acid for random labeling.

       Note that this script could be run by utilizing the compseq program which is part of EMBOSS.

OPTIONS

       The default sequence format is fasta. If no outfile is given, the results will be printed to standard out. All other options can entered
       interactively.

FEEDBACK

   Mailing Lists
       User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to the
       Bioperl mailing list.  Your participation is much appreciated.

	 bioperl-l@bioperl.org			- General discussion
	 http://bioperl.org/wiki/Mailing_lists	- About the mailing lists

   Reporting Bugs
       Report bugs to the Bioperl bug tracking system to help us keep track of the bugs and their resolution. Bug reports can be submitted via the
       web:

	 https://redmine.open-bio.org/projects/bioperl/

AUTHOR - Charles C. Kim
       Email cckim@stanford.edu

HISTORY

       Written July 2, 2001

       Submitted to bioperl scripts project 2001/08/06

       >> 100 x speed optimization by Heikki Lehvaslaiho

perl v5.14.2							    2012-03-02							BP_OLIGO_COUNT(1p)