Adding the values of repeated ids


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Adding the values of repeated ids
# 1  
Old 07-26-2017
Adding the values of repeated ids

File1 consist two columns, first some weired ids and second the numbers
Code:
t|v203.1@t|k88711.1	0.1
t|v190.1@t|k90369.1	0.01
t|v203.1@t|k88711.1	0.5
t|v322.1@t|k88711.1	0.2
t|v207.1@t|k90369.1	0.11
t|v326.1@t|k85939.1	0.5
t|v207.1@t|k90369.1	0.7
t|v207.1@t|k90369.1	0.3
t|v326.1@t|k89421.1	0.33

I want to add up the values of repeated ids and print 4 column File2
desired output File2
Code:
t|v203.1@t|k88711.1	0.1	0.1+0.5	0.6
t|v190.1@t|k90369.1	0.01	0.01	0.01
t|v322.1@t|k88711.1	0.2	0.2	0.2
t|v207.1@t|k90369.1	0.11	0.11+0.7+0.3	1.11
t|v326.1@t|k85939.1	0.5	0.5	0.5
t|v326.1@t|k89421.1	0.33	0.33	0.33

In the desired output file first and second columns are from file1, the third column shows what is going to be added for repeated ids, the fourth column is the values obtained by addition. The ids repeated will merge to one id as row 1 and 3 from File1 merged to row1 in File2.
# 2  
Old 07-26-2017
Code:
awk '
{ b[$1]=$0; c[$1]=(! c[$1] ? _ : c[$1] "+") $NF; d[$1]+=$NF; }
END { for (i in b) print b[i] "\t" c[i] "\t" d[i]; }
' infile

This User Gave Thanks to rdrtx1 For This Post:
# 3  
Old 07-26-2017
The col#2 differs from the requirement?
(Also the b[] stores the $1 string twice, once as the index and another time as part of its value.)
The col#2 seems redundant anyway, so it is omitted in the following solution:
Code:
awk '
{
  if ($1 in sum) {
    sum[$1]+=$2; str[$1]=(str[$1] "+" $2)
  } else {
    sum[$1]=str[$1]=$2
  }
}
END {
  for (i in sum) printf "%s\t%s\t%s\n", i, str[i], sum[i]
}
' infile

Because of the explicit if, adding the col#2 is simple - left as an exercise.

Last edited by MadeInGermany; 07-26-2017 at 04:38 PM..
# 4  
Old 07-27-2017
Alternative: datamash

Hi.

If you can live without the output field that indicates the detailed arithmetic, then this fairly simple command to sort, categorize, and sum associated values may be useful:
Code:
datamash --sort --group=1 sum 2

Here is a complete demonstastion script with output:
Code:
#!/usr/bin/env bash

# @(#) s1       Demonstrate arithmetic on group components, datamash.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
em() { pe "$*" >&2 ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C dixf datamash

FILE=${1-data1}

pl " Input data file $FILE:"
cat $FILE

pl " Results:"
datamash --sort --group=1 sum 2 < $FILE

pl " Details for datamash:"
dixf datamash

exit 0

producing:
Code:
$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution        : Debian 8.8 (jessie) 
bash GNU bash 4.3.30
dixf (local) 1.49
datamash (GNU datamash) 1.0.6

-----
 Input data file data1:
t|v203.1@t|k88711.1     0.1
t|v190.1@t|k90369.1     0.01
t|v203.1@t|k88711.1     0.5
t|v322.1@t|k88711.1     0.2
t|v207.1@t|k90369.1     0.11
t|v326.1@t|k85939.1     0.5
t|v207.1@t|k90369.1     0.7
t|v207.1@t|k90369.1     0.3
t|v326.1@t|k89421.1     0.33

-----
 Results:
t|v190.1@t|k90369.1     0.01
t|v203.1@t|k88711.1     0.6
t|v207.1@t|k90369.1     1.11
t|v322.1@t|k88711.1     0.2
t|v326.1@t|k85939.1     0.5
t|v326.1@t|k89421.1     0.33

-----
 Details for datamash:
datamash        command-line calculations (man)
Path    : /usr/bin/datamash
Version : 1.0.6
Type    : ELF 64-bit LSB executable, x86-64, version 1 (SYSV ...)
Help    : probably available with -h,--help
Repo    : Debian 8.8 (jessie) 
Home    : https://savannah.gnu.org/projects/datamash/ (pm)

As noted, datamash can be found in some repositories and also at the gnu site.

Best wishes ... cheers, drl
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Issue using awk to print min values for unique ids

I am using the following script to search for and print minimum values for each individual Fields (3-14) for each unique id (Field 1). But when the field contains a "-99.99" ( I am ignoring "-99.99") and when the minimum value is the first line of a new id (Field 1), the output does not print Field... (13 Replies)
Discussion started by: ncwxpanther
13 Replies

2. Shell Programming and Scripting

Mapping the values of ids of two columns of file1 from file2

I have of two space separated files: ==> File1 <== PT|np_496075.1 st|K92748.1 st|K89648.1 PT|np_001300561.1 PT|np_497284.1 st|K90752.1 st|K90279.1 PT|np_740775.1 PT|np_497749.1 st|K90752.1 st|K92038.1 PT|np_490856.1 PT|np_497284.1 st|K90752.1 st|K88095.1 PT|np_494764.1 ==> File 2 <==... (2 Replies)
Discussion started by: sammy777888
2 Replies

3. Shell Programming and Scripting

Put repeated values into a table

Hi all, I have blocks of records like the following, each block ends in = in a new line, I want tabularize the entire output. The pattern is the same in every block although not all types are there in every block. For example gine3 is absent in the second block but present first and third. ... (7 Replies)
Discussion started by: ritakadm
7 Replies

4. Shell Programming and Scripting

Find repeated word and take sum of the second field to it ,for all the repeated words in awk

Hi below is the input file, i need to find repeated words and sum up the values of it which is second field from the repeated work.Im trying but getting no where close to it.Kindly give me a hint on how to go about it Input fruits,apple,20,fruits,mango,20,veg,carrot,12,veg,raddish,30... (11 Replies)
Discussion started by: 100bees
11 Replies

5. Shell Programming and Scripting

Resume and count repeated values

Gents, Please can you help me. Input file 1050 , 9 ,9888 1050 ,10 ,9888 1050 ,11 ,9888 1050 ,13 ,9888 1050 ,15 ,9888 1051 , 9 ,9889 1051 ,12 ,9889 1051 ,15 ,9889 1051 ,18 ,9889 1052 , 9 ... (7 Replies)
Discussion started by: jiam912
7 Replies

6. UNIX for Dummies Questions & Answers

Match values/IDs from column and text files

Hello, I am trying to modify 2 files, to yield results in a 3rd file. File-1 is a 8-columned file, separted with tab. 1234:1 xyz1234 blah blah blah blah blah blah 1234:1 xyz1233 blah blah blah blah blah blah 1234:1 abc1234 blah blah blah blah blah blah n/a RRR0000 blah blah blah... (1 Reply)
Discussion started by: ad23
1 Replies

7. UNIX for Dummies Questions & Answers

Extracting column if above certain values and repeated over a number of times continuously

Hi I am new to the forum and would like to ask: i have a file in form with thousands of column id.1 A01 A01 A68 A68 id.2 A5 A5 A3 A3 1001 0 0 0.136 0.136 1002 0 0 0.262 0.183 1003 0 0 0.662 0.662 1004 0 0 ... (9 Replies)
Discussion started by: newbeeuk
9 Replies

8. Shell Programming and Scripting

Adding values concatenating values

I have the following script in a shell # The start of a filename file=$(ls -tr $EMT*.dat | tail -1) # Select the latest file echo $file file_seq=$( < /u02/sct/banner/bandev2/xxxxxx/misc/EFTSQL.dat) echo $file_seq file2 = '$file_seq + 1' echo $file2 It is reading a file EFTSQL.dat... (3 Replies)
Discussion started by: rechever
3 Replies

9. Shell Programming and Scripting

comparing the values of repeated keys in multiple columns

Hi Guyz The 1st column of the input file has repeated keys like x,y and z. The ist task is if the 1st column has unique key (say x) and then need to consider 4th column, if it is + symbol then subtract 2nd column value with 3rd column value (we will get 2(10-8)) or if it is - symbol subtract 3rd... (3 Replies)
Discussion started by: repinementer
3 Replies

10. Shell Programming and Scripting

To Delete the repeated occurances and print in same line by appending values

Hi All, I am trying to work on below text a b c 1 a b c 2 a b c 3 x y z 6 x y z 44 a b c 89 Need to delete the occurances and get in single line like below: a b c 1 2 3 89 x y z 6 44 89 Please help me i am new into unix scripting ..... ---------- Post updated at 03:00... (8 Replies)
Discussion started by: shaliniyadav
8 Replies
Login or Register to Ask a Question