Completey new to scripting, question/help?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Completey new to scripting, question/help?
# 1  
Old 05-27-2015
Completey new to scripting, question/help?

So I need to write a script that can parse our logs and give me the amount of daily activity per user on our website. Unfortunately I'm still learning the very basics so please bear with me Smilie. Below is an example snippet from a log to give you a basic idea of what each entry in the log basically looks like (the important parts I want extracted bolded and are basically just the date and username):


Code:
blahblahblahblah- 05-26@09:31:26:235 INFO (blahblahblahblah) - myorganization.api.ApiHandler-0>getID(blahblahblahblah:"","user_info":{"username":"joe@somecompany.com","orgid":"blahblahblahblah"


So when somebody is on our site and performs activity, clicking through different pages, etc. an entry like above is written to the log for each bit of activity. In each log there can be several different days depending upon the activity (logs rotate based on size).

So far I've got this:

Code:
awk -F"\"username\":\"" '{ print $2 }' logs/mycompany.log | awk -F"\"" '{ print $1 }' | sort | uniq -c

This gives me a list of two colums, with the number of instances (and hence user activity) paired with the username. Now I need to associate these with the date so that for any given day it will output the username and activity and day/date, and output that to .csv file. I'm open to any method really, I "think" it shouldn't be too difficult to modify what I have already but then again I'm new to this and not really sure how to do it right

Last edited by Don Cragun; 05-27-2015 at 01:30 AM.. Reason: Get rid of FONT, COLOR, and SIZE tags; add CODE tags.
# 2  
Old 05-27-2015
Assuming that you want the output sorted by increasing alphanumeric username as the primary sort key and increasing date as the secondary sort key, the following seems to work (assuming your one-line sample is representative of the actual format of your data) without the need for two awk scripts and without the need for uniq:
Code:
awk '
{       split($2, d, "@")
        match($0, /.*"username":"/)
        user = substr($0, RLENGTH + 1)
        user = substr(user, 1, index(user, "\""))
        c[d[1] OFS user]++
}
END {   for(i in c)
                printf("%4d %s\n", c[i], i)
}' logs/mycompany.log | sort -k3,3 -k2,2

This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 05-27-2015
Thanks, yes I would like the output sorted by increasing alphanumeric username as primary sort key, and increasing data & time as the secondary (I didn't find out until today that they want the time in addition to the date but it appears right after the date in the logs).

Unfortunately the script you wrote didn't quite work for my actual logs, however that might be because of the snippet I included. I should've included a better sample from the logs, showing that the date/time line up on the same space each entry so that might make it easier?

I don't know if maybe there is an easier way to do it, or whether awk is the way to go, so if anybody has a better suggestion I'm certainly open to it.

Code:
qtp111659197-5776 - 05-26@09:37:34:240 INFO  (TimingInfoProxy.java:41)     - com.mycompany.api.ApiHandler-0>getUniqueDataBySource(data,{"has_values":false,"last_event_triggered":"","user_info":{"username":"joe@mycompany.com","orgid":"d96467a7-9786-47e1-9c12-bb40f9bfc65d","ip":"127.0.0.1"},"date_range":{"min_date":"","start_date":"","end_date":"","trending_start_date":"","trending_end_date":""},"terms":{"and_filtering":[]}},) 
qtp111659197-5785 - 05-26@09:37:35:100 INFO  (TimingInfoProxy.java:41)     - com.mycompany.api.ApiHandler-0>getDifferentUniqueDataBySource(differentdata,{"has_values":false,"last_event_triggered":"","user_info":{"username":"joe@mycompany.com","orgid":"d96467a7-9786-47e1-9c12-bb40f9bfc65d","ip":"127.0.0.1"},"date_range":{"min_date":"","start_date":"","end_date":"","trending_start_date":"","trending_end_date":""},"terms":{"and_filtering":[]}},)

---------- Post updated at 11:37 AM ---------- Previous update was at 10:44 AM ----------

Also, each entry actually is four lines long when you cat the log, rather than just a single line as it appears when I paste it into here.
# 4  
Old 05-27-2015
Try an adaption of Don Cragun's fine proposal:
Code:
awk '
        {match($0, /username[^,]*/)
         user = substr($0, RSTART+11, RLENGTH-12)
         c[$3 OFS user]++
        }
END     {for(i in c) printf("%4d %s\n", c[i], i)
        }
' file
   1 05-26@09:37:35:100 joe@mycompany.com
   1 05-26@09:37:34:240 joe@mycompany.com

As there is no sample of your four-line-log-entries, I can't help and you'll need to experiment with three getlines (including error handling) to compose a $0 that you can work upon with above.
These 2 Users Gave Thanks to RudiC For This Post:
# 5  
Old 05-27-2015
Yes. When counting fields in awk, there is a huge difference between:
Code:
blahblahblahblah-

and:
Code:
blahblahblahblah -
        or
qtp111659197-5776 -

When you want help from a computer scientist, details about input file format are crucial!

I understood the reason for counting the number of log entries per day for a given user. But, I must be missing the point of counting the number of log entries related to a user on the same date and time. Do you really have multiple log entries for a given user being created in the same millisecond?

Are you really saying that each of the lines shown in your latest sample contains four newline characters? Or (making lots of wild assumptions) are you saying that four lines on your screen are used for each line when you cat the log because each line is somewhere between 360 and 480 characters long (assuming a 120 character line length on your screen) and your terminal is wrapping the output onto 4 screen lines? (Your two sample lines have 430 and 447 characters, respectively, including the terminating newline characters.) Show us the output from the command:
Code:
head -n8 logs/mycompany.log | od -bc

or, if that command fails saying that the -n option isn't recognized:
Code:
head -8 logs/mycompany.log | od -bc

so we have a better chance of understanding what your log file format.
This User Gave Thanks to Don Cragun For This Post:
# 6  
Old 05-27-2015
Quote:
Originally Posted by Don Cragun
When you want help from a computer scientist, details about input file format are crucial!

Sorry about that, after your initial response I realized what I omitted what I thought was useless data was in fact important data for this task Smilie

Quote:
I understood the reason for counting the number of log entries per day for a given user. But, I must be missing the point of counting the number of log entries related to a user on the same date and time.

Sorry for the confusion on this too, the request was sent to me in an email from a person who doesn't understand how we log information and wasn't particularly clear - he's trying to gather info on our customer patterns when using our site. Now that I stop and think about it activity per customer per day should be sufficient. Later on they may want timestamps to see if there are times of day where customers are more active, but I'm not worried about that for now.

Quote:
Or (making lots of wild assumptions) are you saying that four lines on your screen are used for each line when you cat the log because each line is somewhere between 360 and 480 characters long (assuming a 120 character line length on your screen) and your terminal is wrapping the output onto 4 screen lines?
You are correct, if I do a
Code:
wc -l

on a sample file containing a complete entry it is in fact a single line.
# 7  
Old 05-27-2015
So, with the new details about your input file format (and using some of RudiC's suggestions to optimize string handling), does:
Code:
awk '
{	split($3, d, "@")
	match($0, /"username":"[^"]*"/)
	user = substr($0, RSTART + 12, RLENGTH - 13)
	c[d[1] OFS user]++
}
END {	for(i in c)
		printf("%4d %s\n", c[i], i)
}' logs/mycompany.log | sort -k3,3 -k2,2

do what you're trying to do?
This User Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Scripting question

hi all, I am writing a script and beginner in shell scripting. I have tried the below script. could you please check and let me know whether the below scirpt is correct. Unix details : HP Unix Input file. cat input.txt | tail -4 HTS40002.W1978.PROM HTS40003.W1978.PROM... (17 Replies)
Discussion started by: arun888
17 Replies

2. Shell Programming and Scripting

Scripting question

Preview of command prompt f ---> to start ferret q----> to stop ferret asp@nex:~$ f NOAA/PMEL TMAP FERRET v6.82 Linux 2.6.18-308.8.2.el5PAE 32-bit - 08/03/12 3-Dec-12 16:44 yes? go my.jnl yes?column=4/skip=1/type=num,text ............filename.txt ---... (4 Replies)
Discussion started by: nex_asp
4 Replies

3. UNIX for Dummies Questions & Answers

Scripting question

folks; I have a script to remove any files that older than 14 days then move any files that younger than 7 days to another directory. but for some reason it doesn't move the files, when i do it manually it works but not through the script. i tried 2 different ways in writing the move part but it... (6 Replies)
Discussion started by: Katkota
6 Replies

4. Shell Programming and Scripting

Scripting question

Folks; I'm writing a shell script to extract some fields out of a log file & it will run periodically, how can i make it runs starting from where it left of. for example; if the script will do the extract every 2 days, let's say the first run will extract fields until July 25, 2007 @ 11:15:22... (1 Reply)
Discussion started by: moe2266
1 Replies

5. Solaris

Scripting question

I'm writing a small script that will run an executable program (sort of like TOP). To exit the executable, you have to enter control C (^c). I'm trying to use a redirect input file to send the ^c but I'm not having any luck. My short script looks like this - /mydirectory/abc.script < abc.in >... (1 Reply)
Discussion started by: gonzotonka
1 Replies

6. Shell Programming and Scripting

scripting question?

I am writing a backup script for AIX 5 and running into a problem where the output isn't being shown in the output log that is being created. Any ideas on how this would be corrected? I have included the script below. The only thing showing up in the file is listed below. I was hoping to capture... (2 Replies)
Discussion started by: justinburbridge
2 Replies

7. Shell Programming and Scripting

scripting question

I'm new to shell scripting and am having a problem trying to do something in C shell. I want to write a script that will input something instead of a user doing it. For example, using the command 'write' the user is supposed to type something to be sent to another user. I want a script to be able... (3 Replies)
Discussion started by: batmike
3 Replies

8. Shell Programming and Scripting

another scripting question

Hello I am working on cleaning up permissions on Oracle mountpoints and datafiles in unix. I am looking for a script or a scripting idea to 1st. 1. grep for owner oracle 2. ensure its a directory owned for oracle 3. chmod 750 on the oracle owned directory. 4. grep for oracle files, etc... (3 Replies)
Discussion started by: jigarlakhani
3 Replies

9. Shell Programming and Scripting

Scripting Question

This script searches for core files and if it finds one, it emails me to let me know.I DONT want it to email me if it doesn't find one but I can't figure out what I need to change or add. Any thoughts? Script below: /bin/find / -name core -type f -ls -exec file {} \;|/usr/bin/mailx -s... (1 Reply)
Discussion started by: damielle
1 Replies

10. UNIX for Dummies Questions & Answers

another scripting question

I am writing a script that will identify the oldest file in a directory. Here's the syntax: #!/bin/ksh cd directory chmod 777 * ls -r -1t > file1 sed -n -e "1P" < file1 > file2 So my problem is, now I have file2, which contains the name of the oldest file in the directory. How do I use,... (1 Reply)
Discussion started by: kristy
1 Replies
Login or Register to Ask a Question