#!/usr/bin/perl
# senhia83.pl
use strict;
use warnings;
my %group_first_line; # to save the first group encounter
my %group_values; # to save first group value encounter# read every file line by line
while(<>) {
chomp; # remove the ending newline if there# extract group and value
my ($group, $value) = (split)[1,2];
# if the group have not been seen yet create a record of
# first line and value
if ( not exists $group_first_line{$group} ) {
$group_first_line{$group} = $_;
$group_values{$group} = $value;
next; # jump to read line again
}
# if value does not match the first time, substitute it for missing
if ($value ne $group_values{$group}) {
$group_first_line{$group} =~ s/\s\w+?$/ missing/;
}
}
# display the result
for my $group (keys %group_values) {
print "$group_first_line{$group}\n";
}
Sorting is expensive.
The issue here is that a state must be kept until all the data is read and that is a lot of memory.
Sorting might help reducing the memory if in the loop, the lines for the same group can be processed, printed and the hash reset. But we might be just trading some burden for another.
Here's a version that reduces the memory footprint, by eliminating the second hash, eliminates the regex search and does not automatically reassign the value if different at each iteration.
Hopefully, that would help
Code:
#!/usr/bin/perl
use strict;
use warnings;
my %records;
while(<>) {
chomp;
my ($id, $group, $value) = split;
if ( not exists $records{$group} ) {
$records{$group} = [$id, $group, $value];
next;
}
next if $records{$group}->[2] eq "missing";
if ($records{$group}->[2] ne $value) {
$records{$group}->[2] = "missing"}
}
$,="\t";
for my $group (keys %records) {
print "@{$records{$group}}\n";
}
Last edited by Aia; 10-15-2014 at 07:47 PM..
Reason: grammar
Aia, your perl scripts works great, can it be modified slightly to use tab delimited input file?
Hello senhia83,
kindly try following code, I have tesed it with your input file as well as with my teste input file too, hope this helps, will be happy if this works for you.
I'm trying to return only one row with the highest value for PCT_MAX_USED. Any suggestions?
When I add this code, I get the ORA-00937 error.
trunc(max(decode( kbytes_max, 0, 0, (kbytes_alloc/kbytes_max)*100))) pct_max_used
This is the original and returns all rows.
select (select... (3 Replies)
Hi,
I have a 2 node Cluster. Which is working in active/passive mode (i.e Node#1 is running and when it goes down the Node#2 takes over)
Now there's this requirement that we need a mount point say /test that should be available in active node #1 and when node #1 goes down and node#2 takes... (6 Replies)
I've this file and need to sort the data in each group
File would look like this ...
cat file1.txt
Reason : ABC
12345-0023
32123-5400
32442-5333
Reason : DEF
42523-3453
23345-3311
Reason : HIJ
454553-0001
I would like to sort each group on the last 4 fileds and print them... (11 Replies)
Hi All
I do have a file like this with 6 columns. Groups of data merge together and the group number is indicated above each group.
1
1 12 26 289 3.2e-027 GCGTATGGCGGC
2 12 26 215 6.7e+006 TTCCACCTTTTG
3 9 26 175 ... (1 Reply)
I have a file in the following format. Groups of data merge together and the group number is indicated above each group.
1
adrf
dfgr
dfg
2
dfgr
dfgr
3
dfef
dfr
fd
4
fgrt
fgr
fgg
5
fgrt
fgr (3 Replies)
hi,
which Unix/C function can i use to retrieve all group names with a particular group id?
The following C code prints out the group id number of a particular group name:
------------------------------------------------------------------------
#include <stdio.h>
#include <grp.h>
int... (3 Replies)
hi folks,
I've been googling for quite some time, but still can't find anything near it...my problem is the following:
for useradministration in our company we are using ssh/sudo, now whenever I try to add users (we have quite a number of users) with useradd -G groupname for secondary group I... (4 Replies)