The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers
.
google unix.com



UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
split large file based on field criteria asriva Shell Programming and Scripting 6 06-22-2009 10:41 AM
Sort alpha on 1st field, numerical on 2nd field (sci notation) FrancoisCN Shell Programming and Scripting 1 06-12-2009 10:45 AM
Find top N values for field X based on field Y's value FrancoisCN Shell Programming and Scripting 1 05-29-2009 09:57 AM
Split file based on field s_adu Shell Programming and Scripting 6 05-07-2009 02:08 PM
add lines automatically based on a field on another file melanie_pfefer Shell Programming and Scripting 0 07-24-2008 02:59 AM

Reply
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 06-19-2009
treesloth treesloth is offline
Registered User
  
 

Join Date: Oct 2008
Location: Orem, Utah
Posts: 72
awk - Summing a field based on another field

So, I need to do some summing. I have an Apache log file with the following as a typical line:

Code:
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET       /apache_pb.gif HTTP/1.0" 200 2326
Now, what I'd like to do is a per-minute sum. So, I can have awk tell me the individual minutes, preserving the dates(since this is a multi-day log):

Code:
awk '{print substr($4,2,17)}' logfile
I can have it sum based on a single date:

Code:
awk 'substr($4,2,17) == "10/Oct/2000:13:55" && $9 ~ /^[0-9]$*/ && $10 ~ /^[0-9]$*/ {sum += $10}' logfile
...but what I'd really like to do is generate a new sum of $10 for each different instance of substr($4,2,17). This can be done by looping-- a foreach/for, etc-- and feeding minutes into the awk one at a time. The problem is that this method reads the file completely every pass through the loop. Ideally, I'd like to do it with a single pass. Any suggestions?
  #2 (permalink)  
Old 06-19-2009
vgersh99's Avatar
vgersh99 vgersh99 is online now Forum Staff  
Moderator
  
 

Join Date: Feb 2005
Location: Boston, MA
Posts: 5,119
nawk -f tree.awk myLogFile

tree.awk:
Code:
function rindex(str,c)
{
  return match(str,"\\" c "[^" c "]*$")? RSTART : 0
}
{
  idx=substr($4, 2, rindex($4, ":")-1)
  a[idx]+=$NF
}
END {
  for(i in a)
    print i " --> " a[i]
}

Last edited by vgersh99; 06-19-2009 at 08:23 PM.. Reason: a lil' bit more succinct
  #3 (permalink)  
Old 06-19-2009
treesloth treesloth is offline
Registered User
  
 

Join Date: Oct 2008
Location: Orem, Utah
Posts: 72
Quote:
Originally Posted by vgersh99 View Post
nawk -f tree.awk myLogFile...
Oh, wow... that's awesome. Thank you very much. A quick followup question of the sort that I usually hate... If I read that correctly, it doesn't even require that the log entries be in date/time order. Is that so?
  #4 (permalink)  
Old 06-19-2009
vgersh99's Avatar
vgersh99 vgersh99 is online now Forum Staff  
Moderator
  
 

Join Date: Feb 2005
Location: Boston, MA
Posts: 5,119
Quote:
Originally Posted by treesloth View Post
Oh, wow... that's awesome. Thank you very much. A quick followup question of the sort that I usually hate... If I read that correctly, it doesn't even require that the log entries be in date/time order. Is that so?
that's right - this's not necessary. On the flip side, however, the output will be in unsorted, random, unspecified order. The latter, can be easily fixed (if necessary - assuming the INCOMING order is temporal).
  #5 (permalink)  
Old 06-19-2009
treesloth treesloth is offline
Registered User
  
 

Join Date: Oct 2008
Location: Orem, Utah
Posts: 72
Quote:
Originally Posted by vgersh99 View Post
that's right - there's no necessary. On the flip side, however, the output will be in unsorted, random, unspecified order. The latter, can be easily fixed (if necessary - assuming the INCOMING order is temporal.
Yeah, that's a quick fix. I already have a sorter script for the log file itself... Replacing 4.* with 1.* throughout should do it.

Again, many thanks. This is a fantastic time saver.
  #6 (permalink)  
Old 06-20-2009
King Kalyan King Kalyan is offline
Registered User
  
 

Join Date: Sep 2008
Posts: 12
Hi vgersh99,

I'm just going through this post and having trouble understanding the code. Could you please explain the rindex function for me? I'm still learning..

Quote:
function rindex(str,c)
{
return match(str,"\\" c "[^" c "]*$")? RSTART : 0
}
Also, why did you use rindex function and not the following. I'm just curious to know and want to improve the way I look at problems.

Code:
idx=substr($4, 2, 17)
Thanks in advance!
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 02:43 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0