Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Group sums by matching and then multiplying by weights Post 302935747 by ritakadm on Thursday 19th of February 2015 12:28:44 AM
Old 02-19-2015
Group sums by matching and then multiplying by weights

Hi Experts,

Please help with the following.

I have 3 columns in File 1 , variables with values nested within groups.
Code:
File 1

gr1 var1 a
gr1 var2 b
gr1 var3 a
gr1 var4 c
gr2 var1 a
gr2 var2 a
gr2 var4 c
gr3 var1 b
gr3 var3 b
gr3 var4 a
gr3 var5 a

Actual File1 has 8000 groups, and File 2 has 4000 variables.

Code:
File 2

var1 10 a b
var2  -20 a b
var3 -10 a c

the final evaluation of a group is defined as the sum of its variables.
I want to match the variable values with cols 3 and 4 respectively of file 2 , if matches col3 I want to give weight 1, if matches col4 I want to give weight -1, if not match any , give weight 0.
for example gr1 var1 has value a which matches File 2 var1 col3, so weight 1 is assigned... gr1 var2 has b which matches col4 for var2, so weight is -1, gr3 var3 has b which doesnt match col3 or col4 of var3 ,, so weight is given 0. For all missing values also weight is 0.

so the final evaluations are calculated as

group value = sum of ( weight x col2 in file 2 ) for only variables present in file 2. Ignore all other variables in file 1.

Code:
gr1  1x10 + (-1 x -20) + (1 x -10) = 10 + 20 -10 =20
gr2  1x10 + (1 x -20) + (0 x -10) = 10 -20 = -10
gr3  (-1 x 10) + ( 0 x -20) + ( 0 x -10) = -10

Desired output
Code:
gr1 20
gr2 -10
gr3 -10

Heres where I am stuck..

Code:
awk 'NR==FNR{ a[$1] = $3 ; 
                            b[$1]= $4 ; 
                             next}
                          $3 in a { w =1 }
                          $3 in b { w =-1 }
                          $3 ! in a && $3 ! in b { w =0 }
                          { print $1, w * $3 }' File2 File1  | awk 'BEGIN{a[$1]+=$2}END{ for (i in a) print i,a[i]}'


I have used integer numbers for simplicity but actual numbers are decimal in File 2.

Last edited by ritakadm; 02-19-2015 at 01:34 AM..
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

cmos check sums

not entirely unix orientated but.... anyway i've got an old 486dx? 100 that i'll looking to turn into a unix running machine. But everytime i turn it on it comes up with and invalid CMOS checksum, after i've been into the set up and reset that it works fine, tried a new batter and that didn't sort... (1 Reply)
Discussion started by: stevox
1 Replies

2. UNIX for Dummies Questions & Answers

Doing sums

Let me explain my problem, I have a file in the following format 9602622 - User ID 01 -2 - Question number & Grade 02 - 3 03 - 7 04 - 12 05 - 9 06 - 0 9601664 - User ID 01 -2 02 - 3 03 - 7 04 - 12 05 - 9 06 - 0 I need to change the file so it looks like this 9601664 54 -... (1 Reply)
Discussion started by: Captain Woods
1 Replies

3. Shell Programming and Scripting

Error only when multiplying two numbers

Hi $ a=10 ; b=2 $ expr $a + $b 12 $ expr $a - $b 8 $ expr $a / $b 5 $ expr $a * $b expr: syntax error Any idean why I am getting this error only when multiplying two numbers. Whats the exact syntax? Thanks a lot to all in advance CSaha (5 Replies)
Discussion started by: csaha
5 Replies

4. Shell Programming and Scripting

Multiplying Floats/Decimals

Is there a way that i can get something like this to work: Number=`expr 80 \* 10.69` i.e. To multiply an integer by a decimal or a decimal by a decimal etc...? thanks (10 Replies)
Discussion started by: rleebife
10 Replies

5. Shell Programming and Scripting

matching group of words

Hi, I am stuck with a problem, will be thankful for your guidance and help. I have two files. Each line is a group of words with first word as group Id. eg. 'gp1' in File1 and 'grp1' in File2. <File1> gp1 : xyz xys3 syt2 ssx itt kty gp2 : syt2 kgk iti op2 gp3 : ppy yt5 itt sky... (11 Replies)
Discussion started by: mira
11 Replies

6. Shell Programming and Scripting

Column matching and group setting in tab demited file

Please help me with commands for the following file operations File description 5 columns in total , sorted by column 1 value First formatting, 1) Records with duplicate column 1 values are to be ignored. Just consider the first occurrence of such a record. 2) Records with (column 2 -... (3 Replies)
Discussion started by: newbie83
3 Replies

7. Programming

problem in multiplying arrays

Hi, this is my code.It's simple : there are 2 2D arrays and the multiplied to C. #include<stdio.h> #include<sys/shm.h> #include<sys/stat.h> #include<stdlib.h> main() { int *A; //A int *B; //B int *C; //C int i,j,x,k,d; int id; ... (17 Replies)
Discussion started by: giampoul
17 Replies

8. Programming

Multiplying 2D arrays using fork()

HI, i am trying to multiply 2 2D arrays (a,b) using fork. The answer will be at c. Each child have to calculate 1 row of c. The code is right, as i think of it, with no errors but i dont get the correct c array... I think there is maybe a mistake in i dimension ... Anyway, here is the code: ... (16 Replies)
Discussion started by: giampoul
16 Replies

9. UNIX for Dummies Questions & Answers

Script that sums the contents of a folder (help me)

I'm looking for a script that sums the contents of a folder, When you give a parameter to the script , i want to know the size of the directory, the number of files, number of folders, These are commands that I have already found du -s find . -type f | wc -l find . -type d | wc -ly ... (19 Replies)
Discussion started by: Roggy
19 Replies

10. Shell Programming and Scripting

Group Multiple Lines on SINGLE line matching pattern

Hi Guys, I am trying to format my csv file. When I spool the file using sqlplus the single row output is wrapped on three lines. Somehow I managed to format that file and finally i am trying to make the multiple line on single line. The below command is working fine but I need to pass the... (3 Replies)
Discussion started by: RJSKR28
3 Replies
TEXT2PS(L)																TEXT2PS(L)

NAME
text2ps - convert text files to PostScript SYNOPSIS
text2ps [ options ] [ files ] DESCRIPTION
Text2ps reads the input files (standard input if none are specified) and produces PostScript code which, when fed to a PostScript printer, will print the files. With text2ps it is possible to select any font, point size and number of columns. Options and files can be inter- mixed on the command line. Options are effective for all following files until they are overridden. Options Here follows a list of options that text2ps recognizes. Most numeric arguments are significant to one decimal place. Options are evalu- ated from left to right. Later options override earlier ones. -# n Print n copies of each page. (Default 1.) -c n Print in n columns. (Default 1.) -f font Print using font font. (Default Courier.) -p n Print with point size n. (Default 9.) -v n Use a vertical spacing of n points. If the vertical spacing is set to 0, the spacing will be 1.2 times the point size. (Default 0.) -l n Print n lines per column. When the line count is 0, print as many lines as will fit. (Default 0.) -r [p|l] Set the orientation to either portrait mode (p) or landscape mode (l). (Default p.) -b [+|-] Set page break mode. An argument + will force new files to be always printed on a new page (this is the default). After - new files will be put on the same page if there are still empty columns and the number of columns, the orientation or the number of copies didn't change. New files always start new columns. (Default -.) -mt n The top margin is n points. (Default 63.) -mb n The bottom margin is n points. (Default 63.) -ml n The left margin is n points. (Default 59.) -mr n The right margin is n points. (Default 59.) -mg n The inter-column gap is n points. (Default 25.) -t [+|-] If the argument is + the name of the file being printed will be printed on each page. If the argument is - the file name will not be printed. -t + implies -b +. -T text Print text as title on each page. This implies -t - and -b +. This option can be switched off by specifying -t - or -t +. (Default no title.) -F font Set the title font to font. (Default Helvetica.) -P n Set the title point size to n. (Default 12.) -B n Draw borders around each page. The number n specifies how to draw borders. N can have any of the following values or-ed in: 1 Draw a line along the left of the page. 2 Draw a line along the bottom of the page. 4 Draw a line along the right of the page. 8 Draw a line along the top of the page. 16 Draw a line between columns. This line does not connect to the lines along the top or bottom. 32 Draw a connecting line between the line between columns and the line along the top. 64 Draw a connecting line between the line between columns and the line along the bottom. When n is 0, no border lines are drawn. (Default no bordering lines.) -w n Tab stops are set every n spaces. Set the width of the TAB character. (Default 8.) -1 Sets up options to print in one column in portrait mode with the Courier font, so that you get 66 lines on a page. Equivalent to specifying the options -c 1 -f Courier -p 9 -v 0 -r p -l 0 -mt 63 -mb 63 -ml 59 -mr 59. This is the default. -2 Sets up options to print in two columns in landscape mode with the Courier font, so that you get two 66-line columns on a page. Equivalent to specifying the options -c 2 -f Courier -p 6 -v 0 -r l -l 0 -mt 63 -mb 63 -ml 59 -mr 59 -mg 25. Together with the -1 option, this is probably the most useful option. The name - means standard input. BUGS
Too many options. There is no way to specify where the title will be placed. If the font being used is not a constant width font and there are other characters than just tabs and spaces in front of a tab, the next character may not align properly. TEXT2PS(L)
All times are GMT -4. The time now is 03:00 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy