Group sums by matching and then multiplying by weights


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Group sums by matching and then multiplying by weights
# 1  
Old 02-19-2015
Group sums by matching and then multiplying by weights

Hi Experts,

Please help with the following.

I have 3 columns in File 1 , variables with values nested within groups.
Code:
File 1

gr1 var1 a
gr1 var2 b
gr1 var3 a
gr1 var4 c
gr2 var1 a
gr2 var2 a
gr2 var4 c
gr3 var1 b
gr3 var3 b
gr3 var4 a
gr3 var5 a

Actual File1 has 8000 groups, and File 2 has 4000 variables.

Code:
File 2

var1 10 a b
var2  -20 a b
var3 -10 a c

the final evaluation of a group is defined as the sum of its variables.
I want to match the variable values with cols 3 and 4 respectively of file 2 , if matches col3 I want to give weight 1, if matches col4 I want to give weight -1, if not match any , give weight 0.
for example gr1 var1 has value a which matches File 2 var1 col3, so weight 1 is assigned... gr1 var2 has b which matches col4 for var2, so weight is -1, gr3 var3 has b which doesnt match col3 or col4 of var3 ,, so weight is given 0. For all missing values also weight is 0.

so the final evaluations are calculated as

group value = sum of ( weight x col2 in file 2 ) for only variables present in file 2. Ignore all other variables in file 1.

Code:
gr1  1x10 + (-1 x -20) + (1 x -10) = 10 + 20 -10 =20
gr2  1x10 + (1 x -20) + (0 x -10) = 10 -20 = -10
gr3  (-1 x 10) + ( 0 x -20) + ( 0 x -10) = -10

Desired output
Code:
gr1 20
gr2 -10
gr3 -10

Heres where I am stuck..

Code:
awk 'NR==FNR{ a[$1] = $3 ; 
                            b[$1]= $4 ; 
                             next}
                          $3 in a { w =1 }
                          $3 in b { w =-1 }
                          $3 ! in a && $3 ! in b { w =0 }
                          { print $1, w * $3 }' File2 File1  | awk 'BEGIN{a[$1]+=$2}END{ for (i in a) print i,a[i]}'


I have used integer numbers for simplicity but actual numbers are decimal in File 2.

Last edited by ritakadm; 02-19-2015 at 01:34 AM..
# 2  
Old 02-19-2015
While I must learn AWK "one of these days soon"(TM) the following Perl solution does the job required.
Code:
skrynesaver@busybox ~/$ cat tmp/tmp.pl
#!/usr/bin/perl
use strict;
open (my $variables,'<',"$ENV{HOME}/tmp/file2.dat");
my %var; # hashmap of variable=>weighting=>value
while (<$variables>){
    my @r=split(/\s+/,$_);
    $var{$r[0]}{$r[2]}=$r[1];
    $var{$r[0]}{$r[3]}=$r[1] * -1;
}
close($variables);
my %sum; # group=>assigned values weighting;
open (my $groups , '<',"$ENV{HOME}/tmp/file1.dat");
while(<$groups>){
    my @r=split(/\s+/,$_);
    $sum{$r[0]}+=$var{$r[1]}{$r[2]}?$var{$r[1]}{$r[2]}:0;
}
close($groups);
for my $group (sort keys %sum){
    print "$group $sum{$group}\n";
}

skrynesaver@busybox ~/$ perl tmp/tmp.pl
gr1 20
gr2 -10
gr3 -10

This User Gave Thanks to Skrynesaver For This Post:
# 3  
Old 02-19-2015
Although I'm not a friend of one liners, this one is really short:
Code:
awk 'FNR==NR {W[$1,$3]=$2; W[$1, $4]=-$2;next} {SUM[$1]+=W[$2, $3]} END {for (s in SUM) print s, SUM[s]}' file3 file4
gr1 20
gr2 -10
gr3 -10

This User Gave Thanks to RudiC For This Post:
# 4  
Old 02-19-2015
Thanks a lot to both of you,, I`m getting different answers for your scripts,, does the data need to be sorted for any or both?

Update

It was the \r that was messing it up....thanks a lot...

Last edited by ritakadm; 02-19-2015 at 12:34 PM..
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Group Multiple Lines on SINGLE line matching pattern

Hi Guys, I am trying to format my csv file. When I spool the file using sqlplus the single row output is wrapped on three lines. Somehow I managed to format that file and finally i am trying to make the multiple line on single line. The below command is working fine but I need to pass the... (3 Replies)
Discussion started by: RJSKR28
3 Replies

2. UNIX for Dummies Questions & Answers

Script that sums the contents of a folder (help me)

I'm looking for a script that sums the contents of a folder, When you give a parameter to the script , i want to know the size of the directory, the number of files, number of folders, These are commands that I have already found du -s find . -type f | wc -l find . -type d | wc -ly ... (19 Replies)
Discussion started by: Roggy
19 Replies

3. Programming

Multiplying 2D arrays using fork()

HI, i am trying to multiply 2 2D arrays (a,b) using fork. The answer will be at c. Each child have to calculate 1 row of c. The code is right, as i think of it, with no errors but i dont get the correct c array... I think there is maybe a mistake in i dimension ... Anyway, here is the code: ... (16 Replies)
Discussion started by: giampoul
16 Replies

4. Programming

problem in multiplying arrays

Hi, this is my code.It's simple : there are 2 2D arrays and the multiplied to C. #include<stdio.h> #include<sys/shm.h> #include<sys/stat.h> #include<stdlib.h> main() { int *A; //A int *B; //B int *C; //C int i,j,x,k,d; int id; ... (17 Replies)
Discussion started by: giampoul
17 Replies

5. Shell Programming and Scripting

Column matching and group setting in tab demited file

Please help me with commands for the following file operations File description 5 columns in total , sorted by column 1 value First formatting, 1) Records with duplicate column 1 values are to be ignored. Just consider the first occurrence of such a record. 2) Records with (column 2 -... (3 Replies)
Discussion started by: newbie83
3 Replies

6. Shell Programming and Scripting

matching group of words

Hi, I am stuck with a problem, will be thankful for your guidance and help. I have two files. Each line is a group of words with first word as group Id. eg. 'gp1' in File1 and 'grp1' in File2. <File1> gp1 : xyz xys3 syt2 ssx itt kty gp2 : syt2 kgk iti op2 gp3 : ppy yt5 itt sky... (11 Replies)
Discussion started by: mira
11 Replies

7. Shell Programming and Scripting

Multiplying Floats/Decimals

Is there a way that i can get something like this to work: Number=`expr 80 \* 10.69` i.e. To multiply an integer by a decimal or a decimal by a decimal etc...? thanks (10 Replies)
Discussion started by: rleebife
10 Replies

8. Shell Programming and Scripting

Error only when multiplying two numbers

Hi $ a=10 ; b=2 $ expr $a + $b 12 $ expr $a - $b 8 $ expr $a / $b 5 $ expr $a * $b expr: syntax error Any idean why I am getting this error only when multiplying two numbers. Whats the exact syntax? Thanks a lot to all in advance CSaha (5 Replies)
Discussion started by: csaha
5 Replies

9. UNIX for Dummies Questions & Answers

Doing sums

Let me explain my problem, I have a file in the following format 9602622 - User ID 01 -2 - Question number & Grade 02 - 3 03 - 7 04 - 12 05 - 9 06 - 0 9601664 - User ID 01 -2 02 - 3 03 - 7 04 - 12 05 - 9 06 - 0 I need to change the file so it looks like this 9601664 54 -... (1 Reply)
Discussion started by: Captain Woods
1 Replies

10. UNIX for Dummies Questions & Answers

cmos check sums

not entirely unix orientated but.... anyway i've got an old 486dx? 100 that i'll looking to turn into a unix running machine. But everytime i turn it on it comes up with and invalid CMOS checksum, after i've been into the set up and reset that it works fine, tried a new batter and that didn't sort... (1 Reply)
Discussion started by: stevox
1 Replies
Login or Register to Ask a Question