Sponsored Content
Top Forums Shell Programming and Scripting Perl- Finding average "frequency" of occurrence of duplicate lines Post 302545611 by acsg on Tuesday 9th of August 2011 02:17:10 AM
Old 08-09-2011
Quote:
Originally Posted by yazu
I believe it is possible. But I'm not sure I understand the task (sorry, English is not my native language). Please give examples of your input and the desired output. Maybe it would be enough if you give the desired output for my INPUTFILE:
All lines: 9
Lines between a: 1, 2, 0 (or maybe you need to remember line numbers - 1, 3, 6, 7?) so what output?
b: 2 - ?
c: ? (only one occurrence) - ?
d: 0 - ?


Thanks for your reply.
Yeah what I want is something like what you said. So, for your example input file, the output would be:

Code:
a- 4 2 
b- 2 3
c- 1 0
d- 2 1



the first field being the contents of the line being repeated, the second field the number of times found in the file, the third field being the average of "every how many lines it is repeated". So for example for 'a', first it appears after 2 lines, then 3 lines then 1 line. So the average of this makes 2 lines. Then for 'b' and 'd' since they are only duplicated once, there won't be a need to make an average. And, since 'c' is never repeated, then the average is just '0' (or could be blank, it doesn't matter).

On the other hand, how about keeping track of the timestamp and subtracting it to make the "time between repetitions" and then making an average? That was my original idea but I don't know how to keep track of this time, per each repeated line. The output in this case would be something like:

Code:
a- 4 0.05
b- 2 0.89
c- 1 0
d- 2 0.06



the last field being the seconds.

Thanks!

 

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Explain the line "mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'`"

Hi Friends, Can any of you explain me about the below line of code? mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'` Im not able to understand, what exactly it is doing :confused: Any help would be useful for me. Lokesha (4 Replies)
Discussion started by: Lokesha
4 Replies

2. Shell Programming and Scripting

awk command to replace ";" with "|" and ""|" at diferent places in line of file

Hi, I have line in input file as below: 3G_CENTRAL;INDONESIA_(M)_TELKOMSEL;SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL My expected output for line in the file must be : "1-Radon1-cMOC_deg"|"LDIndex"|"3G_CENTRAL|INDONESIA_(M)_TELKOMSEL"|LAST|"SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL" Can someone... (7 Replies)
Discussion started by: shis100
7 Replies

3. Solaris

"Load Average" vs "virtual processor"

Hi, I have one question regarding the understanding of “load average” in a platform with virtual processors. Suppose in this situation: Total number of physical processors: 1 Number of virtual processors: 32 Total number of cores: 4 Number of cores per physical... (1 Reply)
Discussion started by: MDING
1 Replies

4. Shell Programming and Scripting

finding the strings beween 2 characters "/" & "/" in .txt file

Hi all. I have a .txt file that I need to sort it My file is like: 1- 88 chain0 MASTER (FF-TE) FFFF 1962510 /TCK T FD2TQHVTT1 /jtagc/jtag_instreg/updateinstr_reg_1 dff1 (TI,SO) 2- ... (10 Replies)
Discussion started by: Behrouzx77
10 Replies

5. Shell Programming and Scripting

Cant get awk 1liner to remove duplicate lines from Delimited file, get "event not found" error..help

Hi, I am on a Solaris8 machine If someone can help me with adjusting this awk 1 liner (turning it into a real awkscript) to get by this "event not found error" ...or Present Perl solution code that works for Perl5.8 in the csh shell ...that would be great. ****************** ... (3 Replies)
Discussion started by: andy b
3 Replies

6. Shell Programming and Scripting

Find lines with "A" then change "E" to "X" same line

I have a bunch of random character lines like ABCEDFG. I want to find all lines with "A" and then change any "E" to "X" in the same line. ALL lines with "A" will have an "X" somewhere in it. I have tried sed awk and vi editor. I get close, not quite there. I know someone has already solved this... (10 Replies)
Discussion started by: nightwatchrenba
10 Replies

7. UNIX for Dummies Questions & Answers

Using "mailx" command to read "to" and "cc" email addreses from input file

How to use "mailx" command to do e-mail reading the input file containing email address, where column 1 has name and column 2 containing “To” e-mail address and column 3 contains “cc” e-mail address to include with same email. Sample input file, email.txt Below is an sample code where... (2 Replies)
Discussion started by: asjaiswal
2 Replies

8. Shell Programming and Scripting

Bash script - Print an ascii file using specific font "Latin Modern Mono 12" "regular" "9"

Hello. System : opensuse leap 42.3 I have a bash script that build a text file. I would like the last command doing : print_cmd -o page-left=43 -o page-right=22 -o page-top=28 -o page-bottom=43 -o font=LatinModernMono12:regular:9 some_file.txt where : print_cmd ::= some printing... (1 Reply)
Discussion started by: jcdole
1 Replies

9. AIX

Apache 2.4 directory cannot display "Last modified" "Size" "Description"

Hi 2 all, i have had AIX 7.2 :/# /usr/IBMAHS/bin/apachectl -v Server version: Apache/2.4.12 (Unix) Server built: May 25 2015 04:58:27 :/#:/# /usr/IBMAHS/bin/apachectl -M Loaded Modules: core_module (static) so_module (static) http_module (static) mpm_worker_module (static) ... (3 Replies)
Discussion started by: penchev
3 Replies
rl(1)								   User Commands							     rl(1)

NAME
rl - Randomize Lines. SYNOPSIS
rl [OPTION]... [FILE]... DESCRIPTION
rl reads lines from a input file or stdin, randomizes the lines and outputs a specified number of lines. It does this with only a single pass over the input while trying to use as little memory as possible. -c, --count=N Select the number of lines to be returned in the output. If this argument is omitted all the lines in the file will be returned in random order. If the input contains less lines than specified and the --reselect option below is not specified a warning is printed and all lines are returned in random order. -r, --reselect When using this option a single line may be selected multiple times. The default behaviour is that any input line will only be selected once. This option makes it possible to specify a --count option with more lines than the file actually holds. -o, --output=FILE Send randomized lines to FILE instead of stdout. -d, --delimiter=DELIM Use specified character as a "line" delimiter instead of the newline character. -0, --null Input lines are terminated by a null character. This option is useful to process the output of the GNU find -print0 option. -n, --line-number Output lines are numbered with the line number from the input file. -q, --quiet, --silent Be quiet about any errors or warnings. -h, --help Show short summary of options. -v, --version Show version of program. EXAMPLES
Some simple demonstrations of how rl can help you do everyday tasks. Play a random sound after 4 minutes (perfect for toast): sleep 240 ; play `find /sounds -name '*.au' -print | rl --count=1` Play the 15 most recent .mp3 files in random order. ls -c *.mp3 | head -n 15 | rl | xargs --delimiter=' ' play Roll a dice: seq 6 | rl --count 2 Roll a dice 1000 times and see which number comes up more often: seq 6 | rl --reselect --count 1000 | sort | uniq -c | sort -n Shuffle the words of a sentence: echo -n "The rain in Spain stays mainly in the plain." | rl --delimiter=' ';echo Find all movies and play them in random order. find . -name '*.avi' -print0 | rl -0 | xargs -n 1 -0 mplayer Because -0 is used filenames with spaces (even newlines and other unusual characters) in them work. BUGS
The program currently does not have very smart memory management. If you feed it huge files and expect it to fully randomize all lines it will completely read the file in memory. If you specify the --count option it will only use the memory required for storing the specified number of lines. Improvements on this area are on the TODO list. The program uses the rand() system random function. This function returns a number between 0 and RAND_MAX, which may not be very large on some systems. This will result in non-random results for files containing more lines than RAND_MAX. Note that if you specify multiple input files they are randomized per file. This is a different result from when you cat all the files and pipe the result into rl. COPYRIGHT
Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008 Arthur de Jong. This is free software; see the license for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Version 0.2.7 Jul 2008 rl(1)
All times are GMT -4. The time now is 10:26 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy