Aggregate data within the file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Aggregate data within the file
# 1  
Old 11-23-2015
Aggregate data within the file

Guys,
I need to roll up data within the file and build a new file with the output and the same format as the original file.

The data should be rolled up for each unique combination of ord,line,date and hour.. The last column appr is always " "
Below is the format

Original File:

Code:
ord,line,Date,Hour,count,appr
1217,1,11/19/2015,"00",8,""
1217,1,11/19/2015,"00",24,""
1217,1,11/19/2015,"08",4,""
1217,1,11/19/2015,"09",20,""
1217,1,11/19/2015,"10",42,""
1217,1,11/19/2015,"10",62,""
 

New File:

Code:
ord,line,Date,Hour,count,appr
1217,1,11/19/2015,"00",32,""
1217,1,11/19/2015,"08",4,""
1217,1,11/19/2015,"09",20,""
1217,1,11/19/2015,"10",104,""

Any thoughts are appreciated

Last edited by jim mcnamara; 11-23-2015 at 06:04 PM..
# 2  
Old 11-23-2015
Perhaps something like:
Code:
awk '
function printlast() {
	if(last)
		print last, count, appr
	count = 0
}
BEGIN {	FS = OFS = ","
}
FNR == 1 {
	print
	next
}
last != $1 OFS $2 OFS $3 OFS $4 {
	printlast()
	last = $1 OFS $2 OFS $3 OFS $4
	appr = $6
}
{	count += $5
}
END {	printlast()
}' file

would do what you want.

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 11-23-2015
Don,
Thank you very much for the quick response.It works like a charm. I don't have a whole lot of experience using awk.If you don't mind can you explain what the code is doing at the line level?
Thanks again!
# 4  
Old 11-23-2015
Does this help?:
Code:
awk '
# Define function to print results for last aggregated data.
function printlast() {
	# If "last" is not empty (which will be true except for the first time
	# this function is called)...
	if(last)
		# print the results.
		print last, count, appr
	# Clear the accumulated count.
	count = 0
}
# Before reading any input files, set the input and output field separators to a
# comma.
BEGIN {	FS = OFS = ","
}
# When we are looking at the 1st line in an input file...
FNR == 1 {
	# copy the header line to the output...
	print
	# and skip to the next input line without execcuting the remaining lines
	# of this script for this line.
	next
}
# If "last" does not match the first four fields of the current input line...
last != $1 OFS $2 OFS $3 OFS $4 {
	# print the accumulated data for the previous line...
	printlast()
	# set "last" to the first four fields of the current input line...
	last = $1 OFS $2 OFS $3 OFS $4
	# and, set "appr" to the last field on this line.
	appr = $6
}
# Add the count from the current line to the total for lines matching "last".
{	count += $5
}
# After we have processed all lines from the input file, print the accumulated
# data for the last set of aggregated data.
END {	printlast()
}' file

This User Gave Thanks to Don Cragun For This Post:
# 5  
Old 11-24-2015
This really helps!
Thanks

---------- Post updated 11-24-15 at 12:19 PM ---------- Previous update was 11-23-15 at 08:05 PM ----------

It looks the awk command has an issue if the repeated lines are not in the order.
Ex..Running the awk command on the below file doesn't work

Code:
ord,line,Date,Hour,count,appr
1217,1,11/19/2015,"00",8,""
1217,1,11/19/2015,"08",4,""
1217,1,11/19/2015,"10",62,""
1217,1,11/19/2015,"09",20,""
1217,1,11/19/2015,"00",24,""
1217,1,11/19/2015,"10",42,""

Any thoughts?

Last edited by Franklin52; 11-24-2015 at 03:26 PM.. Reason: Please uwe code tags
# 6  
Old 11-24-2015
How about prepending { read X; echo $X; sort; } < file | ?
# 7  
Old 11-24-2015
If the output order doesn't matter, you could use this simpler awk script:
Code:
awk '
# Before reading any input files, set the input and output field separators to a
# comma.
BEGIN {	FS = OFS = ","
}
# When we are looking at the 1st line in an input file...
FNR == 1 {
	# Copy the header line to the output.
	print
	# and skip to the next input line without execcuting the remaining lines
	# of this script for this line.
	next
}
# Accumulate data form remaining lines in the file(s)...
{	# Set "key" to the first four input fields, accumulate the count (field
	# 5) from all lines with they key, and save the last appr field for each
	# key to be printed at the end.
	count[key = $1 OFS $2 OFS $3 OFS $4] += $5
	appr[key] = $6
}
# After we have processed all lines from the input file, print the accumulated
# data.
END {	for(key in count)
		print key, count[key], appr[key]
}' file

Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Solaris

IPMP over aggregate in Solaris 11

hi all, i start with solaris 11 and i am disapointed by the change on ip managing. i want to set a ipmp over tow aggregate but i dont find any doc and i am lost with the new commande switch1 net0 aggregate1 | net1 aggregate1 |-----| |... (1 Reply)
Discussion started by: sylvain
1 Replies

2. Shell Programming and Scripting

Aggregate variables bdfore ssh into remote host

Hi all, I have a problem where i'm trying to identify on which remote hosts the apps are running, ssh into it and restart them. In case more than 1 apps is running on same remote host, i want to be able to group it and ssh only once. E.g: app1 = 1.1.1.1 app2 = 1.1.1.2 app3 =... (4 Replies)
Discussion started by: varu0612
4 Replies

3. Shell Programming and Scripting

simple aggregate task

Hi experts, I need an help on the task below. INPUT: values separated by the tab,first row is the header 20110609 AS A 300.5000 20110609 AS R 200.5000 20110609 BR A 111.5000 20110609 BR R 222.5000 20110610 AS A 100.5500 20110610 AS ... (2 Replies)
Discussion started by: hernand
2 Replies

4. Shell Programming and Scripting

Awk Multiple Files & Aggregate

file 1: 70|236|PPS|0501011818|mms|20090706|001452|00000024|2|0000000000000000|00000|0000000000|0000000000|40948000|1 70|236|PPS|0501020076|mms|20090705|204408|00000019|2|0000000000000000|00000|0000000000|0000000000|40947930|1... (3 Replies)
Discussion started by: magedfawzy
3 Replies

5. IP Networking

Aggregate two internet connections

Hi I have a question related to load balancing.I have two separate internet connections with 2Mbps speed and i would like to aggregate this two connections intro one connection with 4Mbps.Is it possible to do that, to put a Linux or Unix machine as a gateway?I read some stuff to split the... (3 Replies)
Discussion started by: tafil
3 Replies

6. UNIX for Advanced & Expert Users

AWK aggregate records

Hy all, I have a problem...can some one help me... I have a file of records sort: 30|239|ORD|447702936929 |blackberry.net |20080728|141304|00000900|2|0000000000000536|28181|0000000006|0000000001|10|1 30|239|ORD|447702936929 |blackberry.net ... (4 Replies)
Discussion started by: anaconga
4 Replies

7. UNIX Desktop Questions & Answers

Aggregate title to an archive.log

Hello how are you, i have a question i have a file ale.log and i want to agregate a title and later a space when the text is over and put another title (when the text is over) how can i do this? thank you Example Last ------>(Title) i want to agregate pupu pupu pupu pupu... (1 Reply)
Discussion started by: enkei17
1 Replies

8. UNIX for Dummies Questions & Answers

Aggregate values in a file & compare with sql output

Hi, I have a file containing the following data: junk123junk723itemcode001qty01price10total10junkjunk junk123junk723itemcode002qty02price10total20junkjunk .. .. .. could be 5000+ lines I have an algo and need a code to implement this: 1. Linecount = wc -l (should give 5000) 2. For i... (1 Reply)
Discussion started by: shiroh_1982
1 Replies

9. UNIX for Dummies Questions & Answers

aggregate ethernet ports under Solaris

I have been looking for info on how to aggregate 2 ore 3 NIC's into into one big pipe. Any advice would be appreciated. -Chuck (4 Replies)
Discussion started by: 98_1LE
4 Replies
Login or Register to Ask a Question