Do replace operation and awk to sum multiple columns if another column has duplicate values


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Do replace operation and awk to sum multiple columns if another column has duplicate values
# 8  
Old 05-05-2018
I repeat:
Quote:
If you did not post homework, please explain the source of the data you have shown us and explain why you are trying to do this. The input you have shown us does not match the description you have provided, and the output you say you want from the sample input you provided cannot be derived from the instructions you have provided. If this is not homework, please also explain where you found the start of the awk script that you have shown us, and why you are unable to complete it.
# 9  
Old 05-05-2018
Hi RudiC,

Please check i have put my code and its response error in my post.
please check and help
# 10  
Old 05-05-2018
Hi as7951,
I am very disappointed by your activities in the first day in the life of this thread....

You started out with a post that contained inconsistent data that had sample inputs that did not match the sample output you said you wanted to produce, and code that came from an unspecified source with mismatched quotes and uninitialized variables. These problems were mentioned by other posts in this thread over the next 12 hours and then six hours later you made all of those comments look silly by silently replacing the sample input and sample output, and by replacing the code you had originally provided with code that will at least run without producing mismatched quote diagnostic messages. Those changes mean that anyone reading your thread now will be confused about what post #2 through post #7 are talking about.

In the future, if you want to change the samples you provided for input, output, and code you're discussing; add them in a new post instead of replacing them in an earlier post.
# 11  
Old 05-05-2018
I'm confused as well as I lost track of what is the input and what are the requirements and errors. What $13 is present in $16 in the (modified) post#1? Where are the summations in post#1 desired output?
# 12  
Old 05-06-2018
Trying hard to infer from your verbal specification and (incosistent, as said before) samples, and polishing my crystal ball to its extreme, I came up with
Code:
awk -F\| '
NR == 1         {print
                 next
                }
                {for (i=7; i<=15; i+=2) if ((SUM[i,$5] $i) != "") SUM[i,$5] += $i
                }

!REC[$5]        {REC[$5] = $0
                 next
                }
                {F16[$5] = F16[$5] DL[$5] $2
                 DL[$5]  = "^"
                }
END             {for (r in REC) {$0 = REC[r]
                                 for (i=7; i<=15; i+=2) $i = SUM[i,r]
                                 $16 = F16[r]
                                 print 
                                }
                }
' OFS="|" file

How close would that be?
# 13  
Old 05-06-2018
Like RudiC, I find the statement in the updated post #1:
Quote:
The value which is present in column13 should not be present in column16.
to be confusing and to make no sense. It might make sense if the "column13" in that statement had been replaced by "column 2".

Like RudiC, I find it strange that an output field that is supposed to be a sum of one or more input fields is sometimes shown to have a sum that is an empty field. (I would expect a sum of one or more empty or non-empty fields to have a numeric value. But the desired output shown in post #1 has some empty fields that are supposed to be sums.)

Here is an alternative to the code RudiC suggested in post #12:
Code:
awk -v sum_fields='7,9,11,13,15' '
BEGIN {	# Run before reading 1st input file record.  Set input and output field
	# separators and create array sf[] with field values being the field
	# numbers of fields that are to be summed in the output records.  Also
	# set nsf to the number of output fields to be summed.
	FS = OFS = "|"
	nsf = split(sum_fields, sf, /,/)
}
NR == 1 {
	# Save header record to be used as first output record and skip to next
	# input record.
	rec = $0
	next
}
{	# Process each non-header input record:
	if ($5 == f[5]) {
		# This adds up fields to be summed in input records that have
		# the same ID as the previous input record.
		# Add in the previously accumulated values for the fields to
		# be summed.
		for(i = 1; i <= nsf; i++)
			f[sf[i]] = $sf[i] += f[sf[i]]
		# Update the field 16 value and set the field separator for
		# any remaining additions to this field in subsequent records.
		f[16] = $16 = f[16] sep $2
		sep = "^"
		# Reset field 2 to the value that was in the first input record
		# in this group.
		$2 = f[2]
	} else {
		# This is the first record in a new set.
		# Print the accumulated results for the last set of input
		# records and clear the field separator for field 16.
		print rec
		sep = ""
		# Turn empty fields to be summed into zero fields, uave initial
		# values for fields 2 and 5, and clear field 16.
		for(i = 1; i <= nsf; i++)
			f[sf[i]] = $sf[i] += 0
		f[2] = $2
		f[5] = $5
		f[16] = $16 = ""
	}
	# Save current updated input record.  It will be an output record if the
	# next input record has a different ID.
	rec = $0
}
END {	# After processing the last input record, print the last outptu record.
	print rec
}' input.txt

The output produced by the code RudiC suggested in post #12 will have output records in random order. His code will also combine output from input records with the same ID in field #5 whether or not those records are adjacent in the input file. And, fields being summed that only sum empty fields, the output for those fields will be empty fields.

The output produced by the code I suggested above will have output records in the same order as the records that are found in the input file. But, it will only combine output from input records with the same ID in field #5 if those records are adjacent in the input file. For fields being being summed that only sum empty fields, the output for those fields will be zero fields.

With either of these suggestions, if you want to run these on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.

With the input currently in post #1, my suggestion above produces the output:
Code:
a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p
    POS|IN27201800023963|2018-04-24||27AACCE5198E1ZJ||1500|0|0|9|135|9|135||0|IN272018000235^IN27201523963
    POS|IN27201800022938|2018-04-05||27AAJFH2012G1ZS||2|4|0|6|0|7|8||0|

instead of the output that was requested:
Code:
   a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p
    POS|IN27201800023963|2018-04-24||27AACCE5198E1ZJ||1500|0||9|135|9|135|||IN272018000235^IN27201523963|
    POS|IN27201800022938|2018-04-05||27AAJFH2012G1ZS||2|4||6||7|8|||

I see no reason why spaces have been added to the start of the 1st line of output nor why an empty 17th field has been added to the 2nd line of the desired output. As you can see, the code I provided does not provide either of these requested, but unexplained anomalies.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Copy columns from one file into another and get sum of column values and row count

I have a file abc.csv, from which I need column 24(PurchaseOrder_TotalCost) to get the sum_of_amounts with date and row count into another file say output.csv abc.csv- UTF-8,,,,,,,,,,,,,,,,,,,,,,,,, ... (6 Replies)
Discussion started by: Tahir_M
6 Replies

2. Shell Programming and Scripting

awk to Sum columns when other column has duplicates and append one column value to another with Care

Hi Experts, Please bear with me, i need help I am learning AWk and stuck up in one issue. First point : I want to sum up column value for column 7, 9, 11,13 and column15 if rows in column 5 are duplicates.No action to be taken for rows where value in column 5 is unique. Second point : For... (1 Reply)
Discussion started by: as7951
1 Replies

3. Shell Programming and Scripting

Sum duplicate values in text file through awk between dates

I need to sum values in text file in case duplicate row are present with same name and different value below is example of data in file i have and format i need. Data in text file 20170308 PM,U,2 PM,U,113 PM,I,123 DA,U,135 DA,I,113 DA,I,1 20170309 PM,U,2 PM,U,1 PM,I,123 PM,I,1... (3 Replies)
Discussion started by: Adfire
3 Replies

4. Shell Programming and Scripting

Sum values of specific column in multiple files, considering ranges defined in another file

I have a file (let say file B) like this: File B: A1 3 5 A1 7 9 A2 2 5 A3 1 3 The first column defines a filename and the other two define a range in that specific file. In the same directory, I have also three more files (File A1, A2 and A3). Here is 10 sample lines... (3 Replies)
Discussion started by: Bastami
3 Replies

5. Shell Programming and Scripting

awk to sum a column based on duplicate strings in another column and show split totals

Hi, I have a similar input format- A_1 2 B_0 4 A_1 1 B_2 5 A_4 1 and looking to print in this output format with headers. can you suggest in awk?awk because i am doing some pattern matching from parent file to print column 1 of my input using awk already.Thanks! letter number_of_letters... (5 Replies)
Discussion started by: prashob123
5 Replies

6. UNIX for Dummies Questions & Answers

awk to sum column field from duplicate row/lines

Hello, I am new to Linux environment , I working on Linux script which should send auto email based on the specific condition from log file. Below is the sample log file Name m/c usage abc xxx 10 abc xxx 20 abc xxx 5 xyz ... (6 Replies)
Discussion started by: asjaiswal
6 Replies

7. Shell Programming and Scripting

Replace duplicate columns with values from first occurrence

I've a text file with below values viz. multiple rows with same values in column 3, 4 and 5, which need to be considered as duplicates. For all such cases, the rows from second occurrence onwards should be modified in a way that their values in first two columns are replaced with values as in first... (4 Replies)
Discussion started by: asyed
4 Replies

8. Shell Programming and Scripting

Find and replace duplicate column values in a row

I have file which as 12 columns and values like this 1,2,3,4,5 a,b,c,d,e b,c,a,e,f a,b,e,a,h if you see the first column has duplicate values, I need to identify (print it to console) the duplicate value (which is 'a') and also remove duplicate values like below. I could be in two... (5 Replies)
Discussion started by: nuthalapati
5 Replies

9. Shell Programming and Scripting

sum multiple columns based on column value

i have a file - it will be in sorted order on column 1 abc 0 1 abc 2 3 abc 3 5 def 1 7 def 0 1 -------- i'd like (awk maybe?) to get the results (any ideas)??? abc 5 9 def 1 8 (2 Replies)
Discussion started by: jjoe
2 Replies

10. Shell Programming and Scripting

help sum columns by break in first column with awk or sed or something.

I have some data that is something like this? item: onhand counted location ITEM0001 1 0 a1 ITEM0001 0 1 a2 ITEM0002 5 0 b5 ITEM0002 0 6 c1 I want to sum up... (6 Replies)
Discussion started by: syadnom
6 Replies
Login or Register to Ask a Question