Compare within same group


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Compare within same group
# 8  
Old 10-14-2014
In case you do not mind to use Perl

Code:
#!/usr/bin/perl
# senhia83.pl

use strict;
use warnings;

my %group_first_line; # to save the first group encounter
my %group_values; # to save first group value encounter

# read every file line by line
while(<>) {
    chomp; # remove the ending newline if there
    # extract group and value
    my ($group, $value) = (split)[1,2];
    # if the group have not been seen yet create a record of 
    # first line and value
    if ( not exists $group_first_line{$group} ) {
        $group_first_line{$group} = $_;
        $group_values{$group} = $value;
        next; # jump to read line again
    }
    # if value does not match the first time, substitute it for missing
    if ($value ne $group_values{$group}) {
        $group_first_line{$group} =~ s/\s\w+?$/ missing/;
    }
}
# display the result
for my $group (keys %group_values) {
    print "$group_first_line{$group}\n";
}


Usage
Save code as senhia83.pl and run.
Code:
perl senhia83.pl file

# 9  
Old 10-14-2014
Hi Ravinder,

I tested your code with the following dataset

Code:
 
$ cat test3
name2 id1group1 value1
name4 id1group1 value2
name1 id2group1 value2
name2 id2group1 value2
name4 id2group1 value2
name1 id1group2 value1
name2 id1group2 value2

What I am getting

Code:
 
name4 id1group1 missing
name1 id2group1 value2
name2 id1group2 missing

The first column is not correct.
What I should get

Code:
 
name2 id1group1 missing
name1 id2group1 value2
name1 id1group2 missing

Aia, your perl scripts works great, can it be modified slightly to use tab delimited input file?
# 10  
Old 10-14-2014
Quote:
Originally Posted by senhia83
[...]
Aia, your perl scripts works great, can it be modified slightly to use tab delimited input file?
It is already using tab or any other sequence of white spaces as delimiter

This portion is doing the job
Code:
# extract group and value
    my ($group, $value) = (split)[1,2];

Now, if you want only tabs add to the following
Code:
(split '\t')[1,2];

Also
Code:
$group_first_line{$group} =~ s/\s\w+?$/\tmissing/;

---------- Post updated at 10:21 AM ---------- Previous update was at 10:14 AM ----------

Better yet, just change the following:
Code:
$group_first_line{$group} =~ s/(\s)\w+?$/$1missing/;

This User Gave Thanks to Aia For This Post:
# 11  
Old 10-15-2014
Can the code be made more efficient? Will it help if the data is sorted by second column? Its churning through 140 million records for some time now..
# 12  
Old 10-15-2014
Sorting is expensive.
The issue here is that a state must be kept until all the data is read and that is a lot of memory.
Sorting might help reducing the memory if in the loop, the lines for the same group can be processed, printed and the hash reset. But we might be just trading some burden for another.

Here's a version that reduces the memory footprint, by eliminating the second hash, eliminates the regex search and does not automatically reassign the value if different at each iteration.

Hopefully, that would help

Code:
#!/usr/bin/perl

use strict;
use warnings;

my %records;

while(<>) {
    chomp;
    
    my ($id, $group, $value) = split;
    
    if ( not exists $records{$group} ) {
        $records{$group} = [$id, $group, $value];
        next;
    }
    next if $records{$group}->[2] eq "missing";
    if ($records{$group}->[2] ne $value) {
        $records{$group}->[2] = "missing"} 
}

$,="\t";
for my $group (keys %records) {
    print "@{$records{$group}}\n"; 
}


Last edited by Aia; 10-15-2014 at 07:47 PM.. Reason: grammar
This User Gave Thanks to Aia For This Post:
# 13  
Old 10-15-2014
I tested with a small set and it worked fine,,,running on the main data now, will get back to you with fresh troubles Smilie
# 14  
Old 10-16-2014
Quote:
Originally Posted by senhia83
Hi Ravinder,
I tested your code with the following dataset
Code:
 
$ cat test3
name2 id1group1 value1
name4 id1group1 value2
name1 id2group1 value2
name2 id2group1 value2
name4 id2group1 value2
name1 id1group2 value1
name2 id1group2 value2

What I am getting
Code:
 
name4 id1group1 missing
name1 id2group1 value2
name2 id1group2 missing

The first column is not correct.
What I should get

Code:
 
name2 id1group1 missing
name1 id2group1 value2
name1 id1group2 missing

Aia, your perl scripts works great, can it be modified slightly to use tab delimited input file?
Hello senhia83,

kindly try following code, I have tesed it with your input file as well as with my teste input file too, hope this helps, will be happy if this works for you.

Input file1:
Code:
cat group_test1
name2 id1group1 value1
name4 id1group1 value2
name1 id2group1 value2
name2 id2group1 value2
name4 id2group1 value2
name1 id1group2 value1
name2 id1group2 value2

Code as follows:
Code:
awk 'NR==1{X=$2;S[$2]=$0;} {if( X != $2 ){if(!S[$2]){S[$2]=$0;}}} {if( X == $2){if( Y != $3 ){split(S[$2],D," ");D[3]="missing";S[$2]=D[1] OFS D[2] OFS D[3];}}} {X=$2;Y=$3} END{for(u in S){print S}}' group_test1

Output will be as follows.
Code:
name1 id2group1 value2
name2 id1group1 missing
name1 id1group2 missing

Now with my previous test file results as follows:
Input file2:
Code:
cat group_test
name2 group1 value1
name1 group2 value1
name4 group1 value2
name2 group3 value2
name3 group3 value2
name2 group2 value2
name3 group2 value1
name1 group4 value1
name2 group4 value1
name1 group4 value1
name4 group4 value2
name2 group5 value2
name3 group5 value2
name2 group5 value2
name3 group5 value1
name3 group6 value1
name3 group6 value1

Code is as follows.
Code:
awk 'NR==1{X=$2;S[$2]=$0;} {if( X != $2 ){if(!S[$2]){S[$2]=$0;}}} {if( X == $2){if( Y != $3 ){split(S[$2],D," ");D[3]="missing";S[$2]=D[1] OFS D[2] OFS D[3];}}} {X=$2;Y=$3} END{for(u in S){print S}}' group_test

Output is as follows.
Code:
name2 group1 missing
name1 group2 missing
name2 group3 value2
name1 group4 missing
name2 group5 missing
name3 group6 value1

EDIT: Adding a non one liner form of solution too.
Code:
awk 'NR==1{
X=$2;S[$2]=$0;
}
        {if( X != $2 )
                {if(!S[$2])
                        {S[$2]=$0;}
                }
        }
{if( X == $2)
        {if( Y != $3 )
                {split(S[$2],D," ");
                D[3]="missing";S[$2]=D[1] OFS D[2] OFS D[3];
                }
        }
}
{X=$2;Y=$3}
END{
{for(u in S){print S[u]}}
}' group_test  ## Your input file name ##

Thanks,
R. Singh

Last edited by RavinderSingh13; 10-16-2014 at 11:53 AM.. Reason: Added a non one liner form of solution
This User Gave Thanks to RavinderSingh13 For This Post:
 
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Programming

Sql ORA-00937: not a single-group group function

I'm trying to return only one row with the highest value for PCT_MAX_USED. Any suggestions? When I add this code, I get the ORA-00937 error. trunc(max(decode( kbytes_max, 0, 0, (kbytes_alloc/kbytes_max)*100))) pct_max_used This is the original and returns all rows. select (select... (3 Replies)
Discussion started by: progkcp
3 Replies

2. Shell Programming and Scripting

need a one liner to grep a group info from /etc/group and use that result to search passwd file

/etc/group tiadm::345:mk789,po312,jo343,ju454,ko453,yx879,iy345,hn453 bin::2:root,daemon sys::3:root,bin,adm adm::4:root,daemon uucp::5:root /etc/passwd mk789:x:234:1::/export/home/dummy:/bin/sh po312:x:234:1::/export/home/dummy:/bin/sh ju454:x:234:1::/export/home/dummy:/bin/sh... (6 Replies)
Discussion started by: chidori
6 Replies

3. AIX

Adding a Volume Group to an HACMP Resource Group?

Hi, I have a 2 node Cluster. Which is working in active/passive mode (i.e Node#1 is running and when it goes down the Node#2 takes over) Now there's this requirement that we need a mount point say /test that should be available in active node #1 and when node #1 goes down and node#2 takes... (6 Replies)
Discussion started by: aixromeo
6 Replies

4. Shell Programming and Scripting

Sort the file contents in each group....print the group title as well

I've this file and need to sort the data in each group File would look like this ... cat file1.txt Reason : ABC 12345-0023 32123-5400 32442-5333 Reason : DEF 42523-3453 23345-3311 Reason : HIJ 454553-0001 I would like to sort each group on the last 4 fileds and print them... (11 Replies)
Discussion started by: prash184u
11 Replies

5. Shell Programming and Scripting

Merge group numbers and add a column containing group names

Hi All I do have a file like this with 6 columns. Groups of data merge together and the group number is indicated above each group. 1 1 12 26 289 3.2e-027 GCGTATGGCGGC 2 12 26 215 6.7e+006 TTCCACCTTTTG 3 9 26 175 ... (1 Reply)
Discussion started by: Lucky Ali
1 Replies

6. Shell Programming and Scripting

Merge group numbers and add a column containing group names

I have a file in the following format. Groups of data merge together and the group number is indicated above each group. 1 adrf dfgr dfg 2 dfgr dfgr 3 dfef dfr fd 4 fgrt fgr fgg 5 fgrt fgr (3 Replies)
Discussion started by: Lucky Ali
3 Replies

7. Shell Programming and Scripting

KSH to group records in a file and compare it with another file

Hi, I've a file like below: DeptFile.csv DeptID EmpID ------- ------ Dep01 Emp01 Dep01 Emp02 Dep01 Emp03 Dep02 Emp04 Dep02 Emp05 I've another file which has EmpFile.csv EmpID Salary ------ ------ (3 Replies)
Discussion started by: Matrix2682
3 Replies

8. UNIX for Advanced & Expert Users

retrieving all group names with a given group number

hi, which Unix/C function can i use to retrieve all group names with a particular group id? The following C code prints out the group id number of a particular group name: ------------------------------------------------------------------------ #include <stdio.h> #include <grp.h> int... (3 Replies)
Discussion started by: Andrewkl
3 Replies

9. Solaris

entry in /etc/group too long - problem using sudo with %group

hi folks, I've been googling for quite some time, but still can't find anything near it...my problem is the following: for useradministration in our company we are using ssh/sudo, now whenever I try to add users (we have quite a number of users) with useradd -G groupname for secondary group I... (4 Replies)
Discussion started by: poli
4 Replies
Login or Register to Ask a Question