Filter on one column and then perform conditional calculations on another column with a Linux script

03-31-2015

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Trust the error messages. In an editor, go to the respective line and analyse the code. Still, the error may origine in another line, but it's a good starting point.

Why dont you post the error messages?

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

04-01-2015

Registered User

37, 2

Join Date: Mar 2015

Last Activity: 5 April 2017, 1:01 PM EDT

Posts: 37

Thanks Given: 41

Thanked 2 Times in 2 Posts

Ok, so here is the input file I'm using now (sorting it beforehand):

Code:

1.2.3.4 0.01 123 500
1.2.3.4 0.44 123 500
1.2.3.4 0.48 123 500
1.2.3.4 0.52 124 800
1.2.3.5 0.03 44  1500
1.2.3.5 0.08 44  1500
1.2.3.5 0.83 45  80

And the code:

Code:

#!/bin/bash

awk '{ 

$1 != tempIp {
	maxOverallTime = 0
	tempIp = $1
	noSuccPackPerIp=0
	transBytesPerIp=0
	
	while (tempIp == $1){
			transBytesPerIp=0
					
		if($3 != lSeqNo)
		{
			minTime = maxTime = $2
			cnt = 0
			
		
			transBytesForSeqNo = $4

			
			while($3 == lSeqNo) {
				maxTime = $2
				cnt++
				next
			}
			
			if ((maxTime-minTime)>maxOverallTime){
				maxOverallTime=(maxTime-minTime)
			}
			
			if (count<10){
				noSuccPackPerIp++
				transBytesForSeqNo=0
			}
			transBytesPerIp += transBytesForSeqNo
			lSeqNo = $3
			
	
		}
	}
	#printf("%17s %7d %11.3f %f %f %15d\n", tempIp, (maxTime-minTime)/cnt, maxOverallTime, cnt, noSuccPackPerIp, transBytesForSeqNo)
}}' statsSortedX.txt

Have commented out all printouts to just see if I can get the core code to work. These are the error messages:

Code:

awk: line 3: syntax error at or near {
awk: line 42: syntax error at or near }

Thanks for looking at this.

/Z

Zooma

View Public Profile for Zooma

Find all posts by Zooma

04-01-2015

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by Zooma

Ok, so here is the input file I'm using now (sorting it beforehand):

Code:

1.2.3.4 0.01 123 500
1.2.3.4 0.44 123 500
1.2.3.4 0.48 123 500
1.2.3.4 0.52 124 800
1.2.3.5 0.03 44  1500
1.2.3.5 0.08 44  1500
1.2.3.5 0.83 45  80

And the code:

Code:

#!/bin/bash

awk '{ 

$1 != tempIp {
	maxOverallTime = 0
	tempIp = $1
	noSuccPackPerIp=0
	transBytesPerIp=0
	
	while (tempIp == $1){
			transBytesPerIp=0
					
		if($3 != lSeqNo)
		{
			minTime = maxTime = $2
			cnt = 0
			
		
			transBytesForSeqNo = $4

			
			while($3 == lSeqNo) {
				maxTime = $2
				cnt++
				next
			}
			
			if ((maxTime-minTime)>maxOverallTime){
				maxOverallTime=(maxTime-minTime)
			}
			
			if (count<10){
				noSuccPackPerIp++
				transBytesForSeqNo=0
			}
			transBytesPerIp += transBytesForSeqNo
			lSeqNo = $3
			
	
		}
	}
	#printf("%17s %7d %11.3f %f %f %15d\n", tempIp, (maxTime-minTime)/cnt, maxOverallTime, cnt, noSuccPackPerIp, transBytesForSeqNo)
}}' statsSortedX.txt

Have commented out all printouts to just see if I can get the core code to work. These are the error messages:

Code:

awk: line 3: syntax error at or near {
awk: line 42: syntax error at or near }

Thanks for looking at this.

/Z

If you remove the outer pair of braces (shown in red), you'll have a syntactically correct awk script that will run. But, it also has an infinite loop while processing the 1st line in your input file (the while loop also shown in red).

I'm trying to get through your requirements in post #4, and am working on a script to meet those requirements, but I have some other things on my plate right now (so it may be a while before I can post something that works).

It would help if you can post a little more data (showing the results you're trying to get when you have an IP address with unsuccessful retransmissions).

And, please explain what the units are on the timestamps in the 2nd field in your input file. I was assuming that an entry like 0.87 was 87 one hundredths of a second, but you then put a colon in the output and talk about it being minutes and seconds. (But, if that was the case shouldn't the input have been shown as 1:27 instead of as 0.87???)

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

04-01-2015

Registered User

37, 2

Join Date: Mar 2015

Last Activity: 5 April 2017, 1:01 PM EDT

Posts: 37

Thanks Given: 41

Thanked 2 Times in 2 Posts

Hi Don,
Sounds fantastic, thanks! I removed the brackets and as you said it now runs and it's stuck in an infintie loop as you say. I added an additional counter to stop the loop and I get some sort of printout even though it looks quite messy. Will look at that tomorrow.

Here is a bigger input file (sorted) as example. Note that I have sorted after IP, then sequence number and then time (one extra sorting compared to the example code I got from you). The time stamps are in hundreds of a second as you assumed, sorry for messing up with the colon.

Code:

1.2.3.4 0.01 123 500
1.2.3.4 0.44 123 500
1.2.3.4 0.48 123 500
1.2.3.4 0.52 124 800
1.2.3.4 1.00 125 200
1.2.3.4 1.02 125 200
1.2.3.4 1.08 125 200
1.2.3.4 1.11 125 200
1.2.3.4 1.22 125 200
1.2.3.4 1.40 125 200
1.2.3.4 1.55 126 550
1.2.3.4 1.60 127 400
1.2.3.4 1.70 127 400
1.2.3.4 1.75 128 355
1.2.3.5 0.03 44  1500
1.2.3.5 0.08 44  1500
1.2.3.5 0.83 45  80
1.2.3.5 0.88 45  80
1.2.3.5 0.92 45  80
1.2.3.5 0.96 45  80
1.2.3.5 0.97 45  80
1.2.3.5 0.99 45  80
1.2.3.5 1.03 45  80
1.2.3.5 1.14 46  200
1.2.3.5 1.19 47  480
1.2.3.5 1.20 48  800
1.2.3.5 1.30 48  800

This would result in the following output:

Code:

destIP    avgRetransTime   maxRetransTime  noRetrans  noSuccPack  transBytes
-------- ----------------  --------------  ---------  ----------  ----------
1.2.3.4         0.16           0.47            8          5          2605
1.2.3.5         0.07           0:20            8          4          2980

And here is how I derive the numbers per IP:

Code:

avgRetransTime:
1.2.3.4: ((0.48-0.01)+(0.52-0.52)+(1.40-1.00)+(1.55-1.55)+(1.70-1.60)+(1.75-1.75))/6 = 0.16
1.2.3.5: ((0.08-0.03)+(1.03-0.83)+(1.14-1.14)+(1.19-1.19)+(1.30-1.20))/5 = 0.07

Code:

maxRetransTime:
1.2.3.4: 0.47 vs 0.40 vs 0.10 => 0.47
1.2.3.5: 0.05 vs 0.20 vs 0.10 => 0.20

Code:

noRetrans:
1.2.3.4: 8 (seqNo 123 two times, seqNo 125 five times, seqNo 127 once)
1.2.3.5: 8 (seqNo 4 once, seqNo 45 6 times, seqNo 48 once)

Code:

noSuccPack:
1.2.3.4: 5 (seqNo 125 retransmitted more than 5 times => unsuccessful)
1.2.3.5: 4 (seqNo 45 retransmitted more than 5 times => unsuccessful)

Code:

transBytes:
1.2.3.4: 500+800+550+400+355 = 2605 (seqNo 125 counted as 'not delivered')
1.2.3.5: 1500+200+480+800 = 2980 (seqNo 45 counted as 'not delivered')

Thanks!
/Z

Zooma

View Public Profile for Zooma

Find all posts by Zooma

04-05-2015

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

This seems to do what you want, although it uses a slightly different output format:

Code:

#!/bin/ksh
sort -k1,1 -k3,3n -k2,2n stats.txt | awk '
BEGIN { # Perform script initialization steps here...
        # Print output file headers.
	printf("%16s %s %s\n", "", "---Retransmissions---", "Successful")
	printf("%16s %s %s %5s %-10s %s\n",
		"", "Average", "Maximum", "", "  Packet", "Transferred")
	printf("%-16s %-7s %-7s %s %-10s %s\n", " Destination IP", " Time",
	" Time", "Count", "  Count", "   Bytes")
	print "================ ======= ======= ===== ========== ==========="

	# Initialize any variables that should have initial values other than
	# zero or an empty string.
	# No variables need to be set here in this script.
}
function BeginSeqNo() {
	# Initialize values from the 1st record for a new sequence number.
	lastSeqNo = $3
	SeqNoCount = 0
	SeqNoPacketSize = $4
	SeqNoTimeStart = $2
}
function EndSeqNo() {
	# Perform calculations to save results from prior sequence number data
	# lines for this IP address.
	IPCount++
	RetranCount += SeqNoCount - 1
	RetranTime += (SeqNoTime = SeqNoTimeEnd - SeqNoTimeStart)
	if(SeqNoTime > RetranMaxTime)
		RetranMaxTime = SeqNoTime
	if(SeqNoCount <= 5) {
		SuccByteCount += SeqNoPacketSize
		SuccPacketCount++
	}
}
function PrintIP() {
	# Perform calculations and print results for the previous IP address.
	EndSeqNo()
	printf("%-16s %7.3f %7.2f %5d %10d %11d\n", lastIP,
		RetranTime / IPCount, RetranMaxTime, RetranCount,
		SuccPacketCount, SuccByteCount)
}
$1 != lastIP {
	# If there was a header in the input file, it will sort to the end.  If
	# we find the header, we are done...  If there is no header, the END
	# clause will print the results for the last IP in the input file.  The
	# END clause will print the results from the final IP address in the
	# input file.
	if($1 == "destIP")
		exit
	if(NR != 1) {
		# Wrap up calculations for last Sequence number in previous
		# IP and print results for previous IP.
		PrintIP()
	}
	# If we get to this point, this is not the header line, so it must be the
	# 1st record for a new IP address.  Gather data from this record to
	# initialize processing for a new IP address.
	lastIP = $1
	IPCount = RetranCount = RetranTime = RetranMaxTime = SuccByteCount = \
		SuccPacketCount = 0
	# And, initialize data for the 1st sequence number in this new IP
	# address...
	BeginSeqNo()
}
$3 != lastSeqNo {
	# This is the 1st packet in a new sequence number for the current IP
	# address; perform wrap up calculations for the previous sequence number
	# and initialize for the new sequence number.
	EndSeqNo()
	BeginSeqNo()
}
{	# Gather data from this line to data for current seqnence number.
	SeqNoCount++
	SeqNoTimeEnd = $2
}
END {	if(NR) {# If we did not have an empty input file, wrap up calculationss
		# and print results for the last IP address in the file.
		PrintIP()
	}
}'

With the sample input from post #11 in this thread, the above script produces the output:

Code:

                 ---Retransmissions--- Successful
                 Average Maximum         Packet   Transferred
 Destination IP   Time    Time   Count   Count       Bytes
================ ======= ======= ===== ========== ===========
1.2.3.4            0.162    0.47     8          5        2605
1.2.3.5            0.070    0.20     8          4        2980

which seems to match the results you requested in post #11.

Although written and tested using the Korn shell, I don't think there is anything in this script that is shell specific.

If someone wants to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk.

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

04-06-2015

Registered User

37, 2

Join Date: Mar 2015

Last Activity: 5 April 2017, 1:01 PM EDT

Posts: 37

Thanks Given: 41

Thanked 2 Times in 2 Posts

Thanks a lot Don. Works excellent (on a Raspberry Pi). Also very good comments wich will be helpful in my future attempts to write similar scripts. All help very appreciated.

/Z

Zooma

View Public Profile for Zooma

Find all posts by Zooma

Shell Programming and Scripting

Filter on one column and then perform conditional calculations on another column with a Linux script

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Conditional Column Value

Discussion started by: Axleuk

2. Shell Programming and Scripting

awk script to append suffix to column when column has duplicated values

Discussion started by: as7951

3. Shell Programming and Scripting

awk script concatenate two column and perform mutiplication

Discussion started by: as7951

4. UNIX for Dummies Questions & Answers

Command line / script option to filter a data set by values of one column

Discussion started by: gnat01

5. Shell Programming and Scripting

awk , conditional involving line and column

Discussion started by: ariesto

6. Shell Programming and Scripting

Enter third column & Perform Operation

Discussion started by: Ernst

7. Shell Programming and Scripting

Replace a column with a value conditional on a value in col1

Discussion started by: genehunter

8. Shell Programming and Scripting

Conditional aggregation and print of a column in file

Discussion started by: bittoo

9. Shell Programming and Scripting

Sed or awk script to remove text / or perform calculations from large CSV files

Discussion started by: metronomadic

10. Shell Programming and Scripting

How to perform calculations using numbers greater than 2150000000.

Discussion started by: stevefox