Perl- Finding average "frequency" of occurrence of duplicate lines
Hello,
I am working with a perl script that tries to find the average "frequency" in which lines are duplicated. So far I've only managed to find the way to count how many times the lines are repeated, the code is as follows:
Which produces this type of output:
Now, what I want to do is find a way to find out the number of (in average) "every how many lines a certain line is repeated". So I was wondering if it's possible to have some sort of record and then in the end just calculate the average?
I actually have another way to calculate this frequency. In the original file being read, the first field is a unix timestamp (which i "cut out" for the counting of the duplicate lines). So I thought it would be possible as well to try to keep a record of the "time between repetitions" and then make an average in the end. Of course this would imply keeping a record for each duplicate line, which seems like a rather intricate operation. An example of the lines is :
The first field being the unix timestamp. The first, second and third field are ignored for the comparison of duplicate lines.
Any help is deeply appreciated.
---------- Post updated 08-09-11 at 07:49 AM ---------- Previous update was 08-08-11 at 08:40 AM ----------
Is this really not accomplishable the way I asked for in perl? Is there any other way to do it? Any ideas please?
I'd like the average of "every how many lines a certain line is repeated". So say that the line
is repeated first every 2 lines, then the next time it appears after 10 lines, then 2 again, then 4, etc etc.
Is it possible to keep a record of this and make an average? For each duplicate line, of course.
Anyway, if I'm still not being clear enough, please do ask.
Thanks!
I had proposed to use the first field in my file to keep record of time (since it's a unix timestamp). Try to find the "inter-occurrence" time instead of the "every how many lines" record, but I don't know if this would be more complicated.
Is it possible to keep a record of this and make an average?
I believe it is possible. But I'm not sure I understand the task (sorry, English is not my native language). Please give examples of your input and the desired output. Maybe it would be enough if you give the desired output for my INPUTFILE:
All lines: 9
Lines between a: 1, 2, 0 (or maybe you need to remember line numbers - 1, 3, 6, 7?) so what output?
b: 2 - ?
c: ? (only one occurrence) - ?
d: 0 - ?
I believe it is possible. But I'm not sure I understand the task (sorry, English is not my native language). Please give examples of your input and the desired output. Maybe it would be enough if you give the desired output for my INPUTFILE:
All lines: 9
Lines between a: 1, 2, 0 (or maybe you need to remember line numbers - 1, 3, 6, 7?) so what output?
b: 2 - ?
c: ? (only one occurrence) - ?
d: 0 - ?
Thanks for your reply.
Yeah what I want is something like what you said. So, for your example input file, the output would be:
the first field being the contents of the line being repeated, the second field the number of times found in the file, the third field being the average of "every how many lines it is repeated". So for example for 'a', first it appears after 2 lines, then 3 lines then 1 line. So the average of this makes 2 lines. Then for 'b' and 'd' since they are only duplicated once, there won't be a need to make an average. And, since 'c' is never repeated, then the average is just '0' (or could be blank, it doesn't matter).
On the other hand, how about keeping track of the timestamp and subtracting it to make the "time between repetitions" and then making an average? That was my original idea but I don't know how to keep track of this time, per each repeated line. The output in this case would be something like:
Ok. Is this algorithm is right (there is 1 second difference between lines)?
I just tried the algorithm and it works for the example input file but for my actual file, there are a couple of problems.
The input lines in my original file are of the form: So for the comparison of duplicates I want to ignore the fields 0, 1 and 2.
How can I adjust your code to this?
I tried changing this part that only considers the first field:
then I changed it to
but it doesn't seem to work and well, I don't think I quite get what the code does, could you please explain? thanks!
Hi 2 all,
i have had AIX 7.2
:/# /usr/IBMAHS/bin/apachectl -v
Server version: Apache/2.4.12 (Unix)
Server built: May 25 2015 04:58:27
:/#:/# /usr/IBMAHS/bin/apachectl -M
Loaded Modules:
core_module (static)
so_module (static)
http_module (static)
mpm_worker_module (static)
... (3 Replies)
Hello.
System : opensuse leap 42.3
I have a bash script that build a text file.
I would like the last command doing :
print_cmd -o page-left=43 -o page-right=22 -o page-top=28 -o page-bottom=43 -o font=LatinModernMono12:regular:9 some_file.txt
where :
print_cmd ::= some printing... (1 Reply)
How to use "mailx" command to do e-mail reading the input file containing email address, where column 1 has name and column 2 containing “To” e-mail address
and column 3 contains “cc” e-mail address to include with same email.
Sample input file, email.txt
Below is an sample code where... (2 Replies)
I have a bunch of random character lines like ABCEDFG. I want to find all lines with "A" and then change any "E" to "X" in the same line. ALL lines with "A" will have an "X" somewhere in it. I have tried sed awk and vi editor. I get close, not quite there. I know someone has already solved this... (10 Replies)
Hi,
I am on a Solaris8 machine
If someone can help me with adjusting this awk 1 liner (turning it into a real awkscript) to get by this "event not found error"
...or
Present Perl solution code that works for Perl5.8 in the csh shell ...that would be great.
******************
... (3 Replies)
Hi all.
I have a .txt file that I need to sort it
My file is like:
1- 88 chain0 MASTER (FF-TE) FFFF 1962510 /TCK T FD2TQHVTT1 /jtagc/jtag_instreg/updateinstr_reg_1 dff1 (TI,SO)
2- ... (10 Replies)
Hi,
I have one question regarding the understanding of “load average” in a platform with virtual processors.
Suppose in this situation:
Total number of physical processors: 1
Number of virtual processors: 32
Total number of cores: 4
Number of cores per physical... (1 Reply)
Hi,
I have line in input file as below:
3G_CENTRAL;INDONESIA_(M)_TELKOMSEL;SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL
My expected output for line in the file must be :
"1-Radon1-cMOC_deg"|"LDIndex"|"3G_CENTRAL|INDONESIA_(M)_TELKOMSEL"|LAST|"SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL"
Can someone... (7 Replies)
Hi Friends,
Can any of you explain me about the below line of code?
mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'`
Im not able to understand, what exactly it is doing :confused:
Any help would be useful for me.
Lokesha (4 Replies)