awk - find average interarrival times for each unique page
All,
I have a test file as specified below. 1st col is <arrival time> and 2nd col is <Page #>. I want to find the inter-arrival time of requests for each page # (I've done this part already). Once I have this, I want to calculate the average interarrival time. Note, that I am trying to have the average interarrival time for the requests that arrive for each unique page. In other words, I don't want the average inter-arrival time for all of the requests in the trace with no respect to pages, b/c that would be trivial to do.
I know how to do the calculation but my problem is I'm not sure what the best way to store these would be. Before I calculate it, I probably need to store all of the inter-arrival times for each unique page first, then I can calculate the average. Or maybe someone knows of an easier way to do this. Here is my example.
My testfile.txt (the file is sorted by Page # (2nd col))
For the average inter-arrival time, I would just add all the interarrival times up for that page and then divide by [the number of requests for that page - 1]. It is minus one because it is the inter-arrival time between 2 requests.
My desired output should be something like this:
Here is the code I have so far.
Thank you in advance for your help!
Jonathan
That definitely worked for the small sample file I posted! Thanks. However, I am doing this on a very large file and for some reason I am getting negative numbers. I'm guessing it's because I need to take into account for very large numbers? Do I need to cast some of the variables as float or somehow account for very large numbers?
I tried this but I'm still getting negative timestamps. Is the inter-arrival calculation happening correctly? It should be interArrivTime=currTime-prevTime (unless currTime is 0...in which case the ArrivTime for that line should just be 0).
Quote:
Originally Posted by Peasant
For floating point notation you need to use printf with %f in your END block e.g.
Slight modification will display everything as you wish.
---------- Post updated at 03:22 PM ---------- Previous update was at 03:19 PM ----------
Pravin27,
This looks like it's working perfectly! Thank you!
Jonathan
Quote:
Originally Posted by pravin27
Try this,
---------- Post updated at 03:32 PM ---------- Previous update was at 03:22 PM ----------
Thanks everybody for all your help on this...how much harder would it be to also add a 3rd column that gives me the standard deviation for the average inter arrival time for each page?
The formula for standard deviation is:
stand dev = square_root{ Summation[ (x - aveIntArrivTime)^2] / (N-1) }
where
x = the intArrivalTime for each page
aveIntArrivTime = the average InterArrivalTime for each page (which we now have)
N = the number of requests for each page
So I've worked how to add page numbers based on regex. It's using the footer text.
How do we get the total amount added so we have page number with the total number of pages?
Desired output: Page No:1 of 5
Thanks in advance. (15 Replies)
I am very very new to this (as in, I didn't even know awk existed till today)
I have a huuuuge csv file. In column 1, there is a ton of emails. I need to find which emails are unique, and save those rows to a separate file. I also need to find which emails are duplicates, and save a record of... (10 Replies)
Hi ,
Every day I'll get a file, in that I have to match today's file(20130619) third column to previous files (20130618,20130617), that is 124 present in previous files or not. If it matches then I have take the average values of 5th column of 124 from yesterdays and day before yesterdays file,... (5 Replies)
i have a script that scans a log file every 10 minutes. this script remembers the last line of the log and then uses it to continue monitoring the log when it runs again 10 minutes later.
the script searches the log for a string called MaxClients.
now, how can i make it so that when the... (7 Replies)
Hi All,
I need the modification for the below mentioned code (found in one more post https://www.unix.com/shell-programming-scripting/27161-script-generate-average-values.html) to find the average values for all the columns(but for a specific rows) and print the averages side by side.
I have... (4 Replies)
Use and complete the template provided. The entire template must be completed. If you don't, your post may be deleted!
1. The problem statement, all variables and given/known data:
I am trying to complete a script which will allow me to find:
a) reads a value from the keyboard. (ask the... (4 Replies)
I have thousands of lines a day of data I would like to sort out. Every sessions has the 3 lines below. I want to figure out each sessions length from Creation to Deletion. Every one has a unique session ID
logevent3:<190>Nov 20 08:41:06 000423df255c: 6|4096|RC|CAC: Created CAC session ID... (2 Replies)
I am working on a SCO Unixware 7.1.4 server and I have been asked to determine over the last year when a file was accessed, not just the last time it was accessed. Is there anyway to figure this out?
Thanks in advance,
Kevin Harnden (1 Reply)