Do replace operation and awk to sum multiple columns if another column has duplicate values

05-05-2018

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

I repeat:

Quote:

If you did not post homework, please explain the source of the data you have shown us and explain why you are trying to do this. The input you have shown us does not match the description you have provided, and the output you say you want from the sample input you provided cannot be derived from the instructions you have provided. If this is not homework, please also explain where you found the start of the awk script that you have shown us, and why you are unable to complete it.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

05-05-2018

Registered User

67, 1

Join Date: Dec 2017

Last Activity: 11 May 2020, 5:49 AM EDT

Posts: 67

Thanks Given: 9

Thanked 1 Time in 1 Post

Hi RudiC,

Please check i have put my code and its response error in my post.
please check and help

as7951

View Public Profile for as7951

Find all posts by as7951

05-05-2018

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Hi as7951,
I am very disappointed by your activities in the first day in the life of this thread....

You started out with a post that contained inconsistent data that had sample inputs that did not match the sample output you said you wanted to produce, and code that came from an unspecified source with mismatched quotes and uninitialized variables. These problems were mentioned by other posts in this thread over the next 12 hours and then six hours later you made all of those comments look silly by silently replacing the sample input and sample output, and by replacing the code you had originally provided with code that will at least run without producing mismatched quote diagnostic messages. Those changes mean that anyone reading your thread now will be confused about what post #2 through post #7 are talking about.

In the future, if you want to change the samples you provided for input, output, and code you're discussing; add them in a new post instead of replacing them in an earlier post.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

05-05-2018

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

I'm confused as well as I lost track of what is the input and what are the requirements and errors. What $13 is present in $16 in the (modified) post#1? Where are the summations in post#1 desired output?

RudiC

View Public Profile for RudiC

Find all posts by RudiC

05-06-2018

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Trying hard to infer from your verbal specification and (incosistent, as said before) samples, and polishing my crystal ball to its extreme, I came up with

Code:

awk -F\| '
NR == 1         {print
                 next
                }
                {for (i=7; i<=15; i+=2) if ((SUM[i,$5] $i) != "") SUM[i,$5] += $i
                }

!REC[$5]        {REC[$5] = $0
                 next
                }
                {F16[$5] = F16[$5] DL[$5] $2
                 DL[$5]  = "^"
                }
END             {for (r in REC) {$0 = REC[r]
                                 for (i=7; i<=15; i+=2) $i = SUM[i,r]
                                 $16 = F16[r]
                                 print 
                                }
                }
' OFS="|" file

How close would that be?

RudiC

View Public Profile for RudiC

Find all posts by RudiC

05-06-2018

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Like RudiC, I find the statement in the updated post #1:

Quote:

The value which is present in column13 should not be present in column16.

to be confusing and to make no sense. It might make sense if the "column13" in that statement had been replaced by "column 2".

Like RudiC, I find it strange that an output field that is supposed to be a sum of one or more input fields is sometimes shown to have a sum that is an empty field. (I would expect a sum of one or more empty or non-empty fields to have a numeric value. But the desired output shown in post #1 has some empty fields that are supposed to be sums.)

Here is an alternative to the code RudiC suggested in post #12:

Code:

awk -v sum_fields='7,9,11,13,15' '
BEGIN {	# Run before reading 1st input file record.  Set input and output field
	# separators and create array sf[] with field values being the field
	# numbers of fields that are to be summed in the output records.  Also
	# set nsf to the number of output fields to be summed.
	FS = OFS = "|"
	nsf = split(sum_fields, sf, /,/)
}
NR == 1 {
	# Save header record to be used as first output record and skip to next
	# input record.
	rec = $0
	next
}
{	# Process each non-header input record:
	if ($5 == f[5]) {
		# This adds up fields to be summed in input records that have
		# the same ID as the previous input record.
		# Add in the previously accumulated values for the fields to
		# be summed.
		for(i = 1; i <= nsf; i++)
			f[sf[i]] = $sf[i] += f[sf[i]]
		# Update the field 16 value and set the field separator for
		# any remaining additions to this field in subsequent records.
		f[16] = $16 = f[16] sep $2
		sep = "^"
		# Reset field 2 to the value that was in the first input record
		# in this group.
		$2 = f[2]
	} else {
		# This is the first record in a new set.
		# Print the accumulated results for the last set of input
		# records and clear the field separator for field 16.
		print rec
		sep = ""
		# Turn empty fields to be summed into zero fields, uave initial
		# values for fields 2 and 5, and clear field 16.
		for(i = 1; i <= nsf; i++)
			f[sf[i]] = $sf[i] += 0
		f[2] = $2
		f[5] = $5
		f[16] = $16 = ""
	}
	# Save current updated input record.  It will be an output record if the
	# next input record has a different ID.
	rec = $0
}
END {	# After processing the last input record, print the last outptu record.
	print rec
}' input.txt

The output produced by the code RudiC suggested in post #12 will have output records in random order. His code will also combine output from input records with the same ID in field #5 whether or not those records are adjacent in the input file. And, fields being summed that only sum empty fields, the output for those fields will be empty fields.

The output produced by the code I suggested above will have output records in the same order as the records that are found in the input file. But, it will only combine output from input records with the same ID in field #5 if those records are adjacent in the input file. For fields being being summed that only sum empty fields, the output for those fields will be zero fields.

With either of these suggestions, if you want to run these on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.

With the input currently in post #1, my suggestion above produces the output:

Code:

a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p
    POS|IN27201800023963|2018-04-24||27AACCE5198E1ZJ||1500|0|0|9|135|9|135||0|IN272018000235^IN27201523963
    POS|IN27201800022938|2018-04-05||27AAJFH2012G1ZS||2|4|0|6|0|7|8||0|

instead of the output that was requested:

Code:

   a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p
    POS|IN27201800023963|2018-04-24||27AACCE5198E1ZJ||1500|0||9|135|9|135|||IN272018000235^IN27201523963|
    POS|IN27201800022938|2018-04-05||27AAJFH2012G1ZS||2|4||6||7|8|||

I see no reason why spaces have been added to the start of the 1st line of output nor why an empty 17th field has been added to the 2nd line of the desired output. As you can see, the code I provided does not provide either of these requested, but unexplained anomalies.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

Shell Programming and Scripting

Do replace operation and awk to sum multiple columns if another column has duplicate values

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Copy columns from one file into another and get sum of column values and row count

Discussion started by: Tahir_M

2. Shell Programming and Scripting

awk to Sum columns when other column has duplicates and append one column value to another with Care

Discussion started by: as7951

3. Shell Programming and Scripting

Sum duplicate values in text file through awk between dates

Discussion started by: Adfire

4. Shell Programming and Scripting

Sum values of specific column in multiple files, considering ranges defined in another file

Discussion started by: Bastami

5. Shell Programming and Scripting

awk to sum a column based on duplicate strings in another column and show split totals

Discussion started by: prashob123

6. UNIX for Dummies Questions & Answers

awk to sum column field from duplicate row/lines

Discussion started by: asjaiswal

7. Shell Programming and Scripting

Replace duplicate columns with values from first occurrence

Discussion started by: asyed

8. Shell Programming and Scripting

Find and replace duplicate column values in a row

Discussion started by: nuthalapati

9. Shell Programming and Scripting

sum multiple columns based on column value

Discussion started by: jjoe

10. Shell Programming and Scripting

help sum columns by break in first column with awk or sed or something.

Discussion started by: syadnom