Do replace operation and awk to sum multiple columns if another column has duplicate values Post: 303016930

Sponsored Content

Top Forums Shell Programming and Scripting Do replace operation and awk to sum multiple columns if another column has duplicate values Post 303016930 by Don Cragun on Sunday 6th of May 2018 02:22:35 PM

05-06-2018

Registered User

Like RudiC, I find the statement in the updated post #1:

Quote:

The value which is present in column13 should not be present in column16.

to be confusing and to make no sense. It might make sense if the "column13" in that statement had been replaced by "column 2".

Like RudiC, I find it strange that an output field that is supposed to be a sum of one or more input fields is sometimes shown to have a sum that is an empty field. (I would expect a sum of one or more empty or non-empty fields to have a numeric value. But the desired output shown in post #1 has some empty fields that are supposed to be sums.)

Here is an alternative to the code RudiC suggested in post #12:

Code:

awk -v sum_fields='7,9,11,13,15' '
BEGIN {	# Run before reading 1st input file record.  Set input and output field
	# separators and create array sf[] with field values being the field
	# numbers of fields that are to be summed in the output records.  Also
	# set nsf to the number of output fields to be summed.
	FS = OFS = "|"
	nsf = split(sum_fields, sf, /,/)
}
NR == 1 {
	# Save header record to be used as first output record and skip to next
	# input record.
	rec = $0
	next
}
{	# Process each non-header input record:
	if ($5 == f[5]) {
		# This adds up fields to be summed in input records that have
		# the same ID as the previous input record.
		# Add in the previously accumulated values for the fields to
		# be summed.
		for(i = 1; i <= nsf; i++)
			f[sf[i]] = $sf[i] += f[sf[i]]
		# Update the field 16 value and set the field separator for
		# any remaining additions to this field in subsequent records.
		f[16] = $16 = f[16] sep $2
		sep = "^"
		# Reset field 2 to the value that was in the first input record
		# in this group.
		$2 = f[2]
	} else {
		# This is the first record in a new set.
		# Print the accumulated results for the last set of input
		# records and clear the field separator for field 16.
		print rec
		sep = ""
		# Turn empty fields to be summed into zero fields, uave initial
		# values for fields 2 and 5, and clear field 16.
		for(i = 1; i <= nsf; i++)
			f[sf[i]] = $sf[i] += 0
		f[2] = $2
		f[5] = $5
		f[16] = $16 = ""
	}
	# Save current updated input record.  It will be an output record if the
	# next input record has a different ID.
	rec = $0
}
END {	# After processing the last input record, print the last outptu record.
	print rec
}' input.txt

The output produced by the code RudiC suggested in post #12 will have output records in random order. His code will also combine output from input records with the same ID in field #5 whether or not those records are adjacent in the input file. And, fields being summed that only sum empty fields, the output for those fields will be empty fields.

The output produced by the code I suggested above will have output records in the same order as the records that are found in the input file. But, it will only combine output from input records with the same ID in field #5 if those records are adjacent in the input file. For fields being being summed that only sum empty fields, the output for those fields will be zero fields.

With either of these suggestions, if you want to run these on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.

With the input currently in post #1, my suggestion above produces the output:

Code:

a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p
    POS|IN27201800023963|2018-04-24||27AACCE5198E1ZJ||1500|0|0|9|135|9|135||0|IN272018000235^IN27201523963
    POS|IN27201800022938|2018-04-05||27AAJFH2012G1ZS||2|4|0|6|0|7|8||0|

instead of the output that was requested:

Code:

   a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p
    POS|IN27201800023963|2018-04-24||27AACCE5198E1ZJ||1500|0||9|135|9|135|||IN272018000235^IN27201523963|
    POS|IN27201800022938|2018-04-05||27AAJFH2012G1ZS||2|4||6||7|8|||

I see no reason why spaces have been added to the start of the 1st line of output nor why an empty 17th field has been added to the 2nd line of the desired output. As you can see, the code I provided does not provide either of these requested, but unexplained anomalies.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

help sum columns by break in first column with awk or sed or something.

I have some data that is something like this? item: onhand counted location ITEM0001 1 0 a1 ITEM0001 0 1 a2 ITEM0002 5 0 b5 ITEM0002 0 6 c1 I want to sum up...

2. Shell Programming and Scripting

sum multiple columns based on column value

i have a file - it will be in sorted order on column 1 abc 0 1 abc 2 3 abc 3 5 def 1 7 def 0 1 -------- i'd like (awk maybe?) to get the results (any ideas)??? abc 5 9 def 1 8

3. Shell Programming and Scripting

Find and replace duplicate column values in a row

I have file which as 12 columns and values like this 1,2,3,4,5 a,b,c,d,e b,c,a,e,f a,b,e,a,h if you see the first column has duplicate values, I need to identify (print it to console) the duplicate value (which is 'a') and also remove duplicate values like below. I could be in two...

4. Shell Programming and Scripting

Replace duplicate columns with values from first occurrence

I've a text file with below values viz. multiple rows with same values in column 3, 4 and 5, which need to be considered as duplicates. For all such cases, the rows from second occurrence onwards should be modified in a way that their values in first two columns are replaced with values as in first...

5. UNIX for Dummies Questions & Answers

awk to sum column field from duplicate row/lines

Hello, I am new to Linux environment , I working on Linux script which should send auto email based on the specific condition from log file. Below is the sample log file Name m/c usage abc xxx 10 abc xxx 20 abc xxx 5 xyz ...

6. Shell Programming and Scripting

awk to sum a column based on duplicate strings in another column and show split totals

Hi, I have a similar input format- A_1 2 B_0 4 A_1 1 B_2 5 A_4 1 and looking to print in this output format with headers. can you suggest in awk?awk because i am doing some pattern matching from parent file to print column 1 of my input using awk already.Thanks! letter number_of_letters...

7. Shell Programming and Scripting

Sum values of specific column in multiple files, considering ranges defined in another file

I have a file (let say file B) like this: File B: A1 3 5 A1 7 9 A2 2 5 A3 1 3 The first column defines a filename and the other two define a range in that specific file. In the same directory, I have also three more files (File A1, A2 and A3). Here is 10 sample lines...

8. Shell Programming and Scripting

Sum duplicate values in text file through awk between dates

I need to sum values in text file in case duplicate row are present with same name and different value below is example of data in file i have and format i need. Data in text file 20170308 PM,U,2 PM,U,113 PM,I,123 DA,U,135 DA,I,113 DA,I,1 20170309 PM,U,2 PM,U,1 PM,I,123 PM,I,1...

9. Shell Programming and Scripting

awk to Sum columns when other column has duplicates and append one column value to another with Care

Hi Experts, Please bear with me, i need help I am learning AWk and stuck up in one issue. First point : I want to sum up column value for column 7, 9, 11,13 and column15 if rows in column 5 are duplicates.No action to be taken for rows where value in column 5 is unique. Second point : For...

10. UNIX for Beginners Questions & Answers

Copy columns from one file into another and get sum of column values and row count

I have a file abc.csv, from which I need column 24(PurchaseOrder_TotalCost) to get the sum_of_amounts with date and row count into another file say output.csv abc.csv- UTF-8,,,,,,,,,,,,,,,,,,,,,,,,, ...

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

help sum columns by break in first column with awk or sed or something.

Discussion started by: syadnom

2. Shell Programming and Scripting

sum multiple columns based on column value

Discussion started by: jjoe

3. Shell Programming and Scripting

Find and replace duplicate column values in a row

Discussion started by: nuthalapati

4. Shell Programming and Scripting

Replace duplicate columns with values from first occurrence

Discussion started by: asyed

5. UNIX for Dummies Questions & Answers

awk to sum column field from duplicate row/lines

Discussion started by: asjaiswal

6. Shell Programming and Scripting

awk to sum a column based on duplicate strings in another column and show split totals

Discussion started by: prashob123

7. Shell Programming and Scripting

Sum values of specific column in multiple files, considering ranges defined in another file

Discussion started by: Bastami

8. Shell Programming and Scripting

Sum duplicate values in text file through awk between dates

Discussion started by: Adfire

9. Shell Programming and Scripting

awk to Sum columns when other column has duplicates and append one column value to another with Care

Discussion started by: as7951

10. UNIX for Beginners Questions & Answers

Copy columns from one file into another and get sum of column values and row count

Discussion started by: Tahir_M