Unique values in a row sum the next column in UNIX


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Unique values in a row sum the next column in UNIX
# 1  
Old 07-02-2014
Unique values in a row sum the next column in UNIX

Hi would like to ask you guys any advise regarding my problem

I have this kind of data
file.txt
Code:
111111111,20
111111111,50
222222222,70
333333333,40
444444444,10
444444444,20

I need to get this
file1.txt
Code:
111111111,70
222222222,70
333333333,40
444444444,30

using this code I can obtain the output:
Code:
awk -F, '{a[$1]+=$2;}END{for(i in a)print i","a[i];}' file.txt > file1.txt

the starts when it process my actual file have almost 20 million records.

any other suggestion???

Last edited by radoulov; 07-02-2014 at 12:15 PM..
# 2  
Old 07-02-2014
Hello,

kindly use the code tags as per forum rules.

If order doesn't matter for you then following may help.

Code:
awk -F, '{a[$1]+=$2;} END{for(i in a) {print i OFS a[i]}}' OFS=, get_fixed_column_sum_check_test_check12112334

Output will be as follows.

Code:
444444444,30
111111111,70
222222222,70
333333333,40

If in order then please use the following.

Code:
awk -F, '{a[$1]+=$2;} END{for(i in a) {print i OFS a[i]}}' OFS=, get_fixed_column_sum_check_test_check12112334 | sort -t1

Output wil lbe as follows.

Code:
111111111,70
222222222,70
333333333,40
444444444,30

NOTE: where get_fixed_column_sum_check_test_check12112334 is the input file name.


Thanks,
R. Singh
"GOD helps those who help themselves."
# 3  
Old 07-02-2014
I'm still getting this error

Code:
awk: 0602-561 There is not enough memory available now.
 The input line number is 1.36372e+07. The file is test.txt.
 The source line number is 1.


Last edited by radoulov; 07-02-2014 at 12:16 PM..
# 4  
Old 07-02-2014
Is your file already sorted?
This User Gave Thanks to CarloM For This Post:
# 5  
Old 07-02-2014
If sorted, you only need to store the 1st field of the previous line and the actual sum:
Code:
awk -F, '$1!=p1 && NR>1 {print p1","sum; sum=0} {p1=$1; sum+=$2} END {if (NR>0) print p1","sum}' file.txt

These 2 Users Gave Thanks to MadeInGermany For This Post:
# 6  
Old 07-02-2014
Thanks MadeInGermany

this works Smilie

by the way can you please explain what happened?
# 7  
Old 07-02-2014
Since you didn't say anything about whether or not your input file was sorted, the earlier suggestions had to make the assumption that lines in your 20,000,000 line file were in random order. Therefore, all of the key values and the sums of the corresponding 2nd fields had to be kept in memory until the entire file had been read. Then the totals could be printed for each of the different keys present in the file. The error messages you got say that awk ran out of memory trying to accumulate all of the data.

Now that we know that all of the lines with a given key are adjacent in your input file, the later scripts could print a sum for each key as soon as a new key was found. Very little memory is required to do that and your script runs much faster because it needs less system resources to get the job done.
These 2 Users Gave Thanks to Don Cragun For This Post:
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Copy columns from one file into another and get sum of column values and row count

I have a file abc.csv, from which I need column 24(PurchaseOrder_TotalCost) to get the sum_of_amounts with date and row count into another file say output.csv abc.csv- UTF-8,,,,,,,,,,,,,,,,,,,,,,,,, ... (6 Replies)
Discussion started by: Tahir_M
6 Replies

2. UNIX for Beginners Questions & Answers

Sum the values in the column using date column

I have a file which need to be summed up using date column. I/P: 2017/01/01 a 10 2017/01/01 b 20 2017/01/01 c 40 2017/01/01 a 60 2017/01/01 b 50 2017/01/01 c 40 2017/01/01 a 20 2017/01/01 b 30 2017/01/01 c 40 2017/02/01 a 10 2017/02/01 b 20 2017/02/01 c 30 2017/02/01 a 10... (6 Replies)
Discussion started by: Booo
6 Replies

3. UNIX for Dummies Questions & Answers

Match sum of values in each column with the corresponding column value present in trailer record

Hi All, I have a requirement where I need to find sum of values from column D through O present in a CSV file and check whether the sum of each Individual column matches with the value present for that corresponding column present in the trailer record. For example, let's assume for column D... (9 Replies)
Discussion started by: tpk
9 Replies

4. Shell Programming and Scripting

Sum column values based in common identifier in 1st column.

Hi, I have a table to be imported for R as matrix or data.frame but I first need to edit it because I've got several lines with the same identifier (1st column), so I want to sum the each column (2nd -nth) of each identifier (1st column) The input is for example, after sorted: K00001 1 1 4 3... (8 Replies)
Discussion started by: sargotrons
8 Replies

5. Shell Programming and Scripting

Sum specified values (columns) per row

Hello out there, file.txt: comp51820_c1_seq1 42 N 0:0:0:0:0:0 1:0:0:0:0:0 0:0:0:0:0:0 3:0:0:0:0:0 0:0:0:0:0:0 comp51820_c1_seq1 43 N 0:0:0:0:0:0 0:1:0:0:0:0 0:0:0:0:0:0 0:3:0:0:0:0 0:0:0:0:0:0 comp51820_c1_seq1 44 N 0:0:4:0:3:1 0:0:1:9:0:0 10:0:0:0:0:0 0:3:3:2:2:6 2:2:2:5:60:3... (16 Replies)
Discussion started by: pathunkathunk
16 Replies

6. UNIX for Dummies Questions & Answers

awk to sum column field from duplicate row/lines

Hello, I am new to Linux environment , I working on Linux script which should send auto email based on the specific condition from log file. Below is the sample log file Name m/c usage abc xxx 10 abc xxx 20 abc xxx 5 xyz ... (6 Replies)
Discussion started by: asjaiswal
6 Replies

7. Shell Programming and Scripting

Print unique names in each row of a specific column using awk

Is it possible to remove redundant names in the 4th column? input cqWE 100 200 singapore;singapore AZO 300 400 brazil;america;germany;ireland;germany .... .... output cqWE 100 200 singapore AZO 300 400 brazil;america;germany;ireland (4 Replies)
Discussion started by: quincyjones
4 Replies

8. Shell Programming and Scripting

Sum of values coming in a row

Hi, my requirement is to sum values in a row. eg: input is: sum,value1,value2,value3,.....,value N Required Output: sum,<summation of N values> Please help me... (5 Replies)
Discussion started by: MrGopal666
5 Replies

9. Shell Programming and Scripting

print unique values of a column and sum up the corresponding values in next column

Hi All, I have a file which is having 3 columns as (string string integer) a b 1 x y 2 p k 5 y y 4 ..... ..... Question: I want get the unique value of column 2 in a sorted way(on column 2) and the sum of the 3rd column of the corresponding rows. e.g the above file should return the... (6 Replies)
Discussion started by: amigarus
6 Replies

10. Shell Programming and Scripting

Concatenating column values with unique id into single row

Hi, I have a table in Db2 with data say id_1 phase1 id_1 phase2 id_1 phase3 id_2 phase1 id_2 phase2 I need to concatenate the values like id_1 phase1,phase2,phase3 id_2 phase1,phase2 I tried recursive query but in vain as the length of string to be concatenated in quite long. ... (17 Replies)
Discussion started by: jsaravana
17 Replies
Login or Register to Ask a Question