Filter on one column and then perform conditional calculations on another column with a Linux script


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Filter on one column and then perform conditional calculations on another column with a Linux script
# 8  
Old 03-31-2015
Trust the error messages. In an editor, go to the respective line and analyse the code. Still, the error may origine in another line, but it's a good starting point.

Why dont you post the error messages?
This User Gave Thanks to RudiC For This Post:
# 9  
Old 04-01-2015
Ok, so here is the input file I'm using now (sorting it beforehand):

Code:
1.2.3.4 0.01 123 500
1.2.3.4 0.44 123 500
1.2.3.4 0.48 123 500
1.2.3.4 0.52 124 800
1.2.3.5 0.03 44  1500
1.2.3.5 0.08 44  1500
1.2.3.5 0.83 45  80

And the code:

Code:
#!/bin/bash

awk '{ 

$1 != tempIp {
	maxOverallTime = 0
	tempIp = $1
	noSuccPackPerIp=0
	transBytesPerIp=0
	
	while (tempIp == $1){
			transBytesPerIp=0
					
		if($3 != lSeqNo)
		{
			minTime = maxTime = $2
			cnt = 0
			
		
			transBytesForSeqNo = $4

			
			while($3 == lSeqNo) {
				maxTime = $2
				cnt++
				next
			}
			
			if ((maxTime-minTime)>maxOverallTime){
				maxOverallTime=(maxTime-minTime)
			}
			
			if (count<10){
				noSuccPackPerIp++
				transBytesForSeqNo=0
			}
			transBytesPerIp += transBytesForSeqNo
			lSeqNo = $3
			
	
		}
	}
	#printf("%17s %7d %11.3f %f %f %15d\n", tempIp, (maxTime-minTime)/cnt, maxOverallTime, cnt, noSuccPackPerIp, transBytesForSeqNo)
}}' statsSortedX.txt

Have commented out all printouts to just see if I can get the core code to work. These are the error messages:

Code:
awk: line 3: syntax error at or near {
awk: line 42: syntax error at or near }

Thanks for looking at this.

/Z
# 10  
Old 04-01-2015
Quote:
Originally Posted by Zooma
Ok, so here is the input file I'm using now (sorting it beforehand):

Code:
1.2.3.4 0.01 123 500
1.2.3.4 0.44 123 500
1.2.3.4 0.48 123 500
1.2.3.4 0.52 124 800
1.2.3.5 0.03 44  1500
1.2.3.5 0.08 44  1500
1.2.3.5 0.83 45  80

And the code:

Code:
#!/bin/bash

awk '{ 

$1 != tempIp {
	maxOverallTime = 0
	tempIp = $1
	noSuccPackPerIp=0
	transBytesPerIp=0
	
	while (tempIp == $1){
			transBytesPerIp=0
					
		if($3 != lSeqNo)
		{
			minTime = maxTime = $2
			cnt = 0
			
		
			transBytesForSeqNo = $4

			
			while($3 == lSeqNo) {
				maxTime = $2
				cnt++
				next
			}
			
			if ((maxTime-minTime)>maxOverallTime){
				maxOverallTime=(maxTime-minTime)
			}
			
			if (count<10){
				noSuccPackPerIp++
				transBytesForSeqNo=0
			}
			transBytesPerIp += transBytesForSeqNo
			lSeqNo = $3
			
	
		}
	}
	#printf("%17s %7d %11.3f %f %f %15d\n", tempIp, (maxTime-minTime)/cnt, maxOverallTime, cnt, noSuccPackPerIp, transBytesForSeqNo)
}}' statsSortedX.txt

Have commented out all printouts to just see if I can get the core code to work. These are the error messages:

Code:
awk: line 3: syntax error at or near {
awk: line 42: syntax error at or near }

Thanks for looking at this.

/Z
If you remove the outer pair of braces (shown in red), you'll have a syntactically correct awk script that will run. But, it also has an infinite loop while processing the 1st line in your input file (the while loop also shown in red).

I'm trying to get through your requirements in post #4, and am working on a script to meet those requirements, but I have some other things on my plate right now (so it may be a while before I can post something that works).

It would help if you can post a little more data (showing the results you're trying to get when you have an IP address with unsuccessful retransmissions).

And, please explain what the units are on the timestamps in the 2nd field in your input file. I was assuming that an entry like 0.87 was 87 one hundredths of a second, but you then put a colon in the output and talk about it being minutes and seconds. (But, if that was the case shouldn't the input have been shown as 1:27 instead of as 0.87???)
This User Gave Thanks to Don Cragun For This Post:
# 11  
Old 04-01-2015
Hi Don,
Sounds fantastic, thanks! I removed the brackets and as you said it now runs and it's stuck in an infintie loop as you say. I added an additional counter to stop the loop and I get some sort of printout even though it looks quite messy. Will look at that tomorrow.

Here is a bigger input file (sorted) as example. Note that I have sorted after IP, then sequence number and then time (one extra sorting compared to the example code I got from you). The time stamps are in hundreds of a second as you assumed, sorry for messing up with the colon.

Code:
1.2.3.4 0.01 123 500
1.2.3.4 0.44 123 500
1.2.3.4 0.48 123 500
1.2.3.4 0.52 124 800
1.2.3.4 1.00 125 200
1.2.3.4 1.02 125 200
1.2.3.4 1.08 125 200
1.2.3.4 1.11 125 200
1.2.3.4 1.22 125 200
1.2.3.4 1.40 125 200
1.2.3.4 1.55 126 550
1.2.3.4 1.60 127 400
1.2.3.4 1.70 127 400
1.2.3.4 1.75 128 355
1.2.3.5 0.03 44  1500
1.2.3.5 0.08 44  1500
1.2.3.5 0.83 45  80
1.2.3.5 0.88 45  80
1.2.3.5 0.92 45  80
1.2.3.5 0.96 45  80
1.2.3.5 0.97 45  80
1.2.3.5 0.99 45  80
1.2.3.5 1.03 45  80
1.2.3.5 1.14 46  200
1.2.3.5 1.19 47  480
1.2.3.5 1.20 48  800
1.2.3.5 1.30 48  800

This would result in the following output:

Code:
destIP    avgRetransTime   maxRetransTime  noRetrans  noSuccPack  transBytes
-------- ----------------  --------------  ---------  ----------  ----------
1.2.3.4         0.16           0.47            8          5          2605
1.2.3.5         0.07           0:20            8          4          2980

And here is how I derive the numbers per IP:

Code:
avgRetransTime:
1.2.3.4: ((0.48-0.01)+(0.52-0.52)+(1.40-1.00)+(1.55-1.55)+(1.70-1.60)+(1.75-1.75))/6 = 0.16
1.2.3.5: ((0.08-0.03)+(1.03-0.83)+(1.14-1.14)+(1.19-1.19)+(1.30-1.20))/5 = 0.07

Code:
maxRetransTime:
1.2.3.4: 0.47 vs 0.40 vs 0.10 => 0.47
1.2.3.5: 0.05 vs 0.20 vs 0.10 => 0.20

Code:
noRetrans:
1.2.3.4: 8 (seqNo 123 two times, seqNo 125 five times, seqNo 127 once)
1.2.3.5: 8 (seqNo 4 once, seqNo 45 6 times, seqNo 48 once)

Code:
noSuccPack:
1.2.3.4: 5 (seqNo 125 retransmitted more than 5 times => unsuccessful)
1.2.3.5: 4 (seqNo 45 retransmitted more than 5 times => unsuccessful)

Code:
transBytes:
1.2.3.4: 500+800+550+400+355 = 2605 (seqNo 125 counted as 'not delivered')
1.2.3.5: 1500+200+480+800 = 2980 (seqNo 45 counted as 'not delivered')

Thanks!
/Z
# 12  
Old 04-05-2015
This seems to do what you want, although it uses a slightly different output format:
Code:
#!/bin/ksh
sort -k1,1 -k3,3n -k2,2n stats.txt | awk '
BEGIN { # Perform script initialization steps here...
        # Print output file headers.
	printf("%16s %s %s\n", "", "---Retransmissions---", "Successful")
	printf("%16s %s %s %5s %-10s %s\n",
		"", "Average", "Maximum", "", "  Packet", "Transferred")
	printf("%-16s %-7s %-7s %s %-10s %s\n", " Destination IP", " Time",
	" Time", "Count", "  Count", "   Bytes")
	print "================ ======= ======= ===== ========== ==========="

	# Initialize any variables that should have initial values other than
	# zero or an empty string.
	# No variables need to be set here in this script.
}
function BeginSeqNo() {
	# Initialize values from the 1st record for a new sequence number.
	lastSeqNo = $3
	SeqNoCount = 0
	SeqNoPacketSize = $4
	SeqNoTimeStart = $2
}
function EndSeqNo() {
	# Perform calculations to save results from prior sequence number data
	# lines for this IP address.
	IPCount++
	RetranCount += SeqNoCount - 1
	RetranTime += (SeqNoTime = SeqNoTimeEnd - SeqNoTimeStart)
	if(SeqNoTime > RetranMaxTime)
		RetranMaxTime = SeqNoTime
	if(SeqNoCount <= 5) {
		SuccByteCount += SeqNoPacketSize
		SuccPacketCount++
	}
}
function PrintIP() {
	# Perform calculations and print results for the previous IP address.
	EndSeqNo()
	printf("%-16s %7.3f %7.2f %5d %10d %11d\n", lastIP,
		RetranTime / IPCount, RetranMaxTime, RetranCount,
		SuccPacketCount, SuccByteCount)
}
$1 != lastIP {
	# If there was a header in the input file, it will sort to the end.  If
	# we find the header, we are done...  If there is no header, the END
	# clause will print the results for the last IP in the input file.  The
	# END clause will print the results from the final IP address in the
	# input file.
	if($1 == "destIP")
		exit
	if(NR != 1) {
		# Wrap up calculations for last Sequence number in previous
		# IP and print results for previous IP.
		PrintIP()
	}
	# If we get to this point, this is not the header line, so it must be the
	# 1st record for a new IP address.  Gather data from this record to
	# initialize processing for a new IP address.
	lastIP = $1
	IPCount = RetranCount = RetranTime = RetranMaxTime = SuccByteCount = \
		SuccPacketCount = 0
	# And, initialize data for the 1st sequence number in this new IP
	# address...
	BeginSeqNo()
}
$3 != lastSeqNo {
	# This is the 1st packet in a new sequence number for the current IP
	# address; perform wrap up calculations for the previous sequence number
	# and initialize for the new sequence number.
	EndSeqNo()
	BeginSeqNo()
}
{	# Gather data from this line to data for current seqnence number.
	SeqNoCount++
	SeqNoTimeEnd = $2
}
END {	if(NR) {# If we did not have an empty input file, wrap up calculationss
		# and print results for the last IP address in the file.
		PrintIP()
	}
}'

With the sample input from post #11 in this thread, the above script produces the output:
Code:
                 ---Retransmissions--- Successful
                 Average Maximum         Packet   Transferred
 Destination IP   Time    Time   Count   Count       Bytes
================ ======= ======= ===== ========== ===========
1.2.3.4            0.162    0.47     8          5        2605
1.2.3.5            0.070    0.20     8          4        2980

which seems to match the results you requested in post #11.

Although written and tested using the Korn shell, I don't think there is anything in this script that is shell specific.

If someone wants to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk.
This User Gave Thanks to Don Cragun For This Post:
# 13  
Old 04-06-2015
Thanks a lot Don. Works excellent (on a Raspberry Pi). Also very good comments wich will be helpful in my future attempts to write similar scripts. All help very appreciated.

/Z
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Conditional Column Value

Hi Folks, I'm trying tog ain further experience with shell programming and have set my a small goal of writing a little filesystem monitoring script. So far my output is as follows: PACMYDB03 Filesystem Size Used Avail Use% Status /usr/local/mysql/data ... (5 Replies)
Discussion started by: Axleuk
5 Replies

2. Shell Programming and Scripting

awk script to append suffix to column when column has duplicated values

Please help me to get required output for both scenario 1 and scenario 2 and need separate code for both scenario 1 and scenario 2 Scenario 1 i need to do below changes only when column1 is CR and column3 has duplicates rows/values. This inputfile can contain 100 of this duplicated rows of... (1 Reply)
Discussion started by: as7951
1 Replies

3. Shell Programming and Scripting

awk script concatenate two column and perform mutiplication

Need your help in solving this puzzle. Any kind of help will be appreciated and link for any documents to read and learn and to deal with such scenarios would be helpful Concatenate column1 and column2 of file 1. Then check for the concatenated value in Column1 of File2. If found extract the... (14 Replies)
Discussion started by: as7951
14 Replies

4. UNIX for Dummies Questions & Answers

Command line / script option to filter a data set by values of one column

Hi all! I have a data set in this tab separated format : Label, Value1, Value2 An instance is "data.txt" : 0 1 1 -1 2 3 0 2 2 I would like to parse this data set and generate two files, one that has only data with the label 0 and the other with label -1, so my outputs should be, for... (1 Reply)
Discussion started by: gnat01
1 Replies

5. Shell Programming and Scripting

awk , conditional involving line and column

Dear All, I indeed your help for managing resarch data file. for example I have, data1.txt : type of atoms z vz Si 34 54 O 20 56 H 14 13 Si 40 17 O ... (11 Replies)
Discussion started by: ariesto
11 Replies

6. Shell Programming and Scripting

Enter third column & Perform Operation

I am trying to enter a third column in this file, but the third column should that I call "Math" perform a some math calculations based on the value found in column #2. Here is the input file: Here is the desired output: Output GERk0203078$ Levir Math Cotete_1... (5 Replies)
Discussion started by: Ernst
5 Replies

7. Shell Programming and Scripting

Replace a column with a value conditional on a value in col1

Hi, Perhaps a rather simple problem...? I have data that looks like this. BPC0013 ANNUL_49610 0 0 1 1 BPC0014 ANNUL_49642 0 0 2 1 BPC0015 ANNUL_49580 0 0 1 1 BPC0016 ANNUL_49596 0 0 2 1 BPC0017 VULGO_49612 0 0 1 1 BPC0018 ANNUL_49628 0 0 1 1 BPC0019 ANNUL_49692 0 0 2 1 170291_HMG... (4 Replies)
Discussion started by: genehunter
4 Replies

8. Shell Programming and Scripting

Conditional aggregation and print of a column in file

Hi My input file looks like field1 field2 field3 field4 field5 field1 field2 field3 field4 field5 field1 field2 field3 field4 field5 :::::::::::: :::::::::::: There may be one space of multiple spaces between fields and no fields contains spaces in them. If field 1 to 4 are equal for... (3 Replies)
Discussion started by: bittoo
3 Replies

9. Shell Programming and Scripting

Sed or awk script to remove text / or perform calculations from large CSV files

I have a large CSV files (e.g. 2 million records) and am hoping to do one of two things. I have been trying to use awk and sed but am a newbie and can't figure out how to get it to work. Any help you could offer would be greatly appreciated - I'm stuck trying to remove the colon and wildcards in... (6 Replies)
Discussion started by: metronomadic
6 Replies

10. Shell Programming and Scripting

How to perform calculations using numbers greater than 2150000000.

Could someone tell me how to perform calculations using numbers greater than 2150000000 in Korn Shell? When I tried to do it it gave me the wrong answer. e.g. I have a ksh file with the contents below: --------------------------------- #!/bin/ksh SUM=`expr 2150000000 + 2` PRODUCT=`expr... (3 Replies)
Discussion started by: stevefox
3 Replies
Login or Register to Ask a Question