Get the average of lines with the same first 4 letters

08-14-2018

Registered User

197, 3

Join Date: Feb 2008

Last Activity: 12 March 2020, 3:32 PM EDT

Posts: 197

Thanks Given: 43

Thanked 3 Times in 3 Posts

Get the average of lines with the same first 4 letters

How to sum up and print into the next line the total SUM.

Code:

]$ cat hhhh
aaa1a 1
aaa1g 2
aaa1f 3
baa4f 3
baa4d 4
baa4s 4
cddg1 3
cddg3 4
cddfg 1

$ cat hhhh|awk ' {sum+=$2} END {print sum}'
25

Desire output:

Code:

aaa1a 1
aaa1g 2
aaa1f 3

Total Sum of aaa1: 6

=======================
baa4f 3
baa4d 4
baa4s 4
 
Total Sum of baa4: 11

=======================
cddg1 3
cddg3 4
cddfg 1
Total Sum of cddg: 8

kenshinhimura

View Public Profile for kenshinhimura

Find all posts by kenshinhimura

08-14-2018

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

You might try something like:

Code:

awk '
function print_total() {
	printf("\nTotal Sum of %s: %d\n", last, total)
}
last != substr($1, 1, 4) {
	if(NR > 1) {
		print_total()
		printf("\n=======================\n")
	}
	last = substr($1, 1, 4)
	total = 0
}
{	print
	total += $2
}
END {	print_total()
}' hhhh

but, with the sample data you provided, I get the output:

Code:

aaa1a 1
aaa1g 2
aaa1f 3

Total Sum of aaa1: 6

=======================
baa4f 3
baa4d 4
baa4s 4

Total Sum of baa4: 11

=======================
cddg1 3
cddg3 4

Total Sum of cddg: 7

=======================
cddfg 1

Total Sum of cddf: 1

instead of what you said you wanted. The output above seems to more correctly match the title of this thread. If this isn't what you really wanted, please explain your requirements more clearly.

You should always tell us what operating system and shell you're using when you start a new thread. Otherwise, suggestions you receive might not work in your environment. In this case, if you're using a Solaris/SunOS operating system, change awk in the above suggestion to /usr/xpg4/bin/awk or nawk.

Note that using cat as you did in your sample code, eats up system resources and makes your code slower than letting awk read the file directly (as I did in my suggestion above).

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

08-15-2018

Registered User

197, 3

Join Date: Feb 2008

Last Activity: 12 March 2020, 3:32 PM EDT

Posts: 197

Thanks Given: 43

Thanked 3 Times in 3 Posts

Thank you Don, my question also.. Is there a way to 1 liner that? In one liner. ii understand it more easily and i can able to repeat it someday.
Thank you

------ Post updated at 07:39 AM ------

Also if you can one liner please do, if not.. can you please explain each line? put comment on them please?

kenshinhimura

View Public Profile for kenshinhimura

Find all posts by kenshinhimura

08-15-2018

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Of course my suggested awk script can be turned into a single line of code. But, if you can't do that on your own, there is absolutely no possible way that the 1-liner version of that code will be easier to understand than the version I provided before. Code is always easier to understand if you can see the structure that shows, by indentation, how commands are connected. And, you can't include comments in the middle of an awk 1-liner, so I can't provide you with a commented 1-liner.

Hoping that comments will help you understand what the code is doing, I provide the following:

Code:

awk '	# Invoke awk and start the script specifying actions to be performed.
function print_total() {
	# Define function to print the total acccumulated for the previous set
	# of lines with the same four characters at the start of the 1st field.
	printf("\nTotal Sum of %s: %d\n", last, total)
}
last != substr($1, 1, 4) {
	# Perform the actions in this group when the 1st four characters in the
	# 1st field on this line is not the same as the 1st four characters in
	# the 1st field that we saw on the previous line.  Since there is no
	# previous line when we are reading the 1st line from our input file,
	# we also perform the actions in this group when processing the 1st
	# line in the input file.
	if(NR > 1) {
		# Perform these actions when we are not processing line #1.
		print_total()				# Print the total for
							# for the previous set.
		printf("\n=======================\n")	# Print a set separator.
	}
	# Save the 1st four characters from this line to compare against
	# subsequent input lines.
	last = substr($1, 1, 4)
	# Clear the total for this new set.
	total = 0
}
{	# Perform the actions in this group for every line in the input file.
	print		# Print the current input line.
	total += $2	# Add the contents of the 2nd field on this line to the
			# total for this set.
}
END {	# Perform the actions in this group after we have read the last line of
	# input from the input file.
	print_total()	# Print the total for the last set.
}' hhhh	# End the awk script and name the file to be processed.

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

08-15-2018

Registered User

197, 3

Join Date: Feb 2008

Last Activity: 12 March 2020, 3:32 PM EDT

Posts: 197

Thanks Given: 43

Thanked 3 Times in 3 Posts

Don,
How can we get the average? based on the sum?

------ Post updated at 09:16 AM ------

Will get the total and average..

Code:

aaa1a 1
aaa1g 2
aaa1f 3

Total Sum of aaa1: 6

Total Average of aaa1: 2 #because 6/3

kenshinhimura

View Public Profile for kenshinhimura

Find all posts by kenshinhimura

08-15-2018

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Why don't you try adding processing for averages to the code I suggested.

If you can't get it to work, show us what you have tried and we'll help you fix it.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

08-15-2018

Registered User

197, 3

Join Date: Feb 2008

Last Activity: 12 March 2020, 3:32 PM EDT

Posts: 197

Thanks Given: 43

Thanked 3 Times in 3 Posts

Code:

awk '
function print_total() {
        printf("\nTotal Sum of %s site is: %d Mbps\n ", last, total)
}
function print_average() {
        printf("\nTotal Sum of %s site is: %d Mbps\n ", last, average)
}

last != substr($1, 1, 7) {
        if(NR > 1) {
                print_total()
                printf("\n=======================\n")
        }
        last = substr($1, 1, 7)
        total = 0
        average = 0
}
{       print
        total += $2
        average total/NR
}
END {   print_total()
        print_average()
}' hhhh > datarate.log
cat datarate.log

trying different combination but to no avail. Thank you

kenshinhimura

View Public Profile for kenshinhimura

Find all posts by kenshinhimura

Shell Programming and Scripting

Get the average of lines with the same first 4 letters

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk to average matching lines in file

Discussion started by: cmccabe

2. UNIX for Beginners Questions & Answers

Random letters

Discussion started by: eldeingles

3. Shell Programming and Scripting

Combine identical lines and average the one variable field

Discussion started by: jfern

4. Shell Programming and Scripting

Randomize letters

Discussion started by: jeppe83

5. Shell Programming and Scripting

How to add lines of a file and average them

Discussion started by: AxlVanDamme

6. Shell Programming and Scripting

Perl- Finding average "frequency" of occurrence of duplicate lines

Discussion started by: acsg

7. Shell Programming and Scripting

print running field average for a set of lines

Discussion started by: euval

8. UNIX for Advanced & Expert Users

Add letters

Discussion started by: aadi_uni

9. Shell Programming and Scripting

need to delete lines that start with letters

Discussion started by: sfisk

10. Shell Programming and Scripting

transposing letters

Discussion started by: myscsa2004