Help in modifying existing Perl Script to produce report of dupes Post: 302630437

Sponsored Content

Top Forums Shell Programming and Scripting Help in modifying existing Perl Script to produce report of dupes Post 302630437 by gimley on Wednesday 25th of April 2012 09:10:15 PM

04-25-2012

Registered User

Help in modifying existing Perl Script to produce report of dupes

Hello,
I have a large amount of data with the following structure:
Word=Transliterated word
I have written a Perl Script (reproduced below) which goes through the full file and identifies all dupes on the right hand side. It creates successfully a new file with two headers: Singletons and Dupes.
I have tried to modify the script to produce additionally a record listing the frequency count of all dupes. Thus in the sample provided, I would like to know how many times the dupe Albert has been transliterated in different ways. I am providing pseudo-data since the original data is in a foreign script.

Quote:

Albert=albt
Albert=albut
Albert=albat
Mary=mari
Mary=meri
Mary=merry
Mary=marey

The script should give me a report in a separate output with the following structure:

Quote:

Albert,3, albt,albut,albat
Mary,4,mari,meri,merry,marey

The final output would thus have two files:
The output file listing Singletons and Dupes
The report which would have the dupes listed along with their frequency.
I am not very good at generating reports in Perl and hence the request:
Perl script follows.
Many thanks for excellent help and advice given.

Code:

#!/usr/bin/perl

$dupes = $singletons = "";		# This goes at the head of the file

do {
    $dupefound = 0;			# These go at the head of the loop
    $text = $line = $prevline = $name = $prevname = "";
    do {
	$line = <>;
	$line =~ /^(.+)\=.+$/ and $name = $1;
	$prevline =~ /^(.+)\=.+$/ and $prevname = $1;
	if ($name eq $prevname) { $dupefound += 1 }
	$text .= $line;
	$prevline = $line;
    } until ($dupefound > 0 and $text !~ /^(.+?)\=.*?\n(?:\1=.*?\n)+\z/m) or eof;
    if ($text =~ s/(^(.+?)\=.*?\n(?:\2=.*?\n)+)//m) { $dupes .= $1 }
    $singletons .= $text;
} until eof;
print "SINGLETONS\n$singletons\n\DUPES\n$dupes";

Last edited by Franklin52; 04-26-2012 at 03:48 AM.. Reason: Corrected code tags

gimley

View Public Profile for gimley

Find all posts by gimley

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl Script to produce a file

hi i got a file called essay which contain few pages with many paragraphs. now i wanna with PERL to produce another file which called Essaylist that contain a sorted list of words that appear in the file essay. the format for Essaylist: $word found $times times on page a b c.... where $word...

2. Shell Programming and Scripting

modifying perl script

Hi freinds I have a small problem I want u to help me in, I have a syslog server and configured it to send me email automatically, I get a small perl script to help me in, and tested it to send alerts to root and it worked successfully without any problems Now I want to send it outside, I...

3. Infrastructure Monitoring

modifying existing file using C

Hi all, I have a snmpd.conf file as below. in "SECTION: Trap Destinations" line I want to add "trap2dest <IP>:162 <com_str>" on a new line. For this I wrote following code #include <stdio.h> #include <stdlib.h> int main(void) { FILE *fp; ssize_t read_char_count = 0; ...

4. Shell Programming and Scripting

Shell script that will compare two config files and produce 2 outputs 1)actual config file 2)report

Hi I am new to shell scripting. There is a requirement to write a shell script to meet follwing needs.Prompt reply shall be highly appreciated. script that will compare two config files and produce 2 outputs - actual config file and a report indicating changes made. OS :Susi linux ver 10.3. ...

5. Shell Programming and Scripting

Using an awk script to identify dupes in two files

Hello, I have two files. File1 or the master file contains two columns separated by a delimiter: a=b b=d e=f g=h File 2 which is the file to be processed has only a single column a h c b What I need is an awk script to identify unique names from file 2 which are not found in the...

6. Shell Programming and Scripting

Script to produce report of High Utilization Processes

Hi, I have requirement to produce a report on high CPU utilization processes and the processes lying on the CPU for long time (Long running queries). The report should append into the files every 3 minutes. I use prstat to pull top 5 and found the following result. ...

7. Shell Programming and Scripting

Script for identifying and deleting dupes in a line

I am compiling a synonym dictionary which has the following structure Headword=Synonym1,Synonym2 and so on, with each synonym separated by a comma. As is usual in such cases manual preparation of synonyms results in repeating the synonym which results in dupes as in the example below:...

8. Shell Programming and Scripting

Removing Dupes from huge file- awk/perl/uniq

Hi, I have the following command in place nawk -F, '!a++' file > file.uniq It has been working perfectly as per requirements, by removing duplicates by taking into consideration only first 3 fields. Recently it has started giving below error: bash-3.2$ nawk -F, '!a++'...

9. Shell Programming and Scripting

Help with Perl script for identifying dupes in column1

Dear all, I have a large dictionary database which has the following structure source word=target word e.g. book=livre Since the database is very large in spite of all the care taken, it so happens that at times the source word is repeated e.g. book=livre book=tome Since I want to...

10. Shell Programming and Scripting

Help in modifying a PERL script to sort Singletons and Duplicates

I have a large database which has the following structure a=b where a is one language and b is the other and = is the delimiter Since the data treats of language, homographs occur i.e. the same word on the left hand side can map in two different entries to two different glosses on the right...

LEARN ABOUT DEBIAN

ratproxy-report

RATPROXY-REPORT(1)						   User Commands						RATPROXY-REPORT(1)

NAME

       ratproxy-report - report generator for the ratproxy tool

SYNOPSIS

       ratproxy-report ratproxy.log

DESCRIPTION

       This  is  essentially  a prettyprinter for ratproxy logs. It removes dupes, sorts entries within groups, then sorts groups base don highest
       priority within the group, and produces some nice HTML with form replay capabilities.

OPTIONS

       ratproxy-report takes no options, only the name of the ratproxy-generated log file, and displays the generated HTML file on  standard  out-
       put.

ENVIRONMENT

       The environment variable RAT_URLPREFIX can be used to specify an absolute URL prefix for the trace/decompile links, if available. Otherwise
       they will be referenced with relative links. If the generated report will be stored in a directory different from the parameter -v to  rat-
       proxy, then you should set this variable to that directory.

EXAMPLES

	   $ ratproxy-report ratproxy.log >report.html

AUTHOR

       ratproxy is written and maintained by Michal Zalewski <lcamtuf@google.com>

       This manual page was generated via help2man by Iustin Pop <iusty@k1024.org> for the Debian project (but may be used by others).

SEE ALSO

       ratproxy(1)

ratproxy 1.56-beta						    April 2009							RATPROXY-REPORT(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl Script to produce a file

Discussion started by: mingming88

2. Shell Programming and Scripting

modifying perl script

Discussion started by: reaky

3. Infrastructure Monitoring

modifying existing file using C

Discussion started by: zing_foru

4. Shell Programming and Scripting

Shell script that will compare two config files and produce 2 outputs 1)actual config file 2)report

Discussion started by: muraliinfy04

5. Shell Programming and Scripting

Using an awk script to identify dupes in two files

Discussion started by: gimley

6. Shell Programming and Scripting

Script to produce report of High Utilization Processes

Discussion started by: thinakarmani

7. Shell Programming and Scripting

Script for identifying and deleting dupes in a line

Discussion started by: gimley

8. Shell Programming and Scripting

Removing Dupes from huge file- awk/perl/uniq

Discussion started by: makn

9. Shell Programming and Scripting

Help with Perl script for identifying dupes in column1

Discussion started by: gimley

10. Shell Programming and Scripting

Help in modifying a PERL script to sort Singletons and Duplicates

Discussion started by: gimley

LEARN ABOUT DEBIAN

ratproxy-report