Parse log file to insert into database


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Parse log file to insert into database
# 1  
Old 08-16-2015
Parse log file to insert into database

I have a log file that's created daily by this command:

Code:
sar -u 300 288 >> /var/log/usage/$(date "+%Y-%m-%d")_$(hostname)_cpu.log

It that contains data like this:

Code:
Linux 3.16.0-4-amd64 (myhostname)       08/15/2015      _x86_64_        (1 CPU)

11:34:17 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
11:39:17 PM     all      0.09      0.00      0.07      0.09      0.00     99.75
11:44:17 PM     all      0.04      0.00      0.03      0.00      0.00     99.92
11:49:17 PM     all      1.49      0.00      0.49      0.06      0.00     97.96
11:54:17 PM     all     23.27      0.00      0.51      0.05      0.03     76.14


11:56:12 PM     all      0.17      0.00      0.13      0.01      0.01     99.69
Average:        all      5.69      0.00      0.26      0.05      0.01     93.99


I'm not sure if there's a way to strip out the data I don't need before it logs to the file such as %nice, %iowait, and %steal. As well as the header and footer information. It would make the rest of this post irrelevant.

I need to ignore the first few rows and the last row. I also need to ignore any blank lines.

The only data I need to insert into the database is the time, %user, %system, and %idle data.

The table looks like this:

Code:
row_id (int) | date (date) | time (time) | hostname (varchar) | user (decimal) | system (decimal) | idle (decimal)

The log file name is YYYY-MM-DD_hostname_cpu.log format. I need to extract the date of the filename and insert into the table as well.

So far I have

Code:
cat YYYY-MM-DD_hostname_cpu.log | awk 'NR printf "%s,%s,%s\n", $4,$6,$9'

I then needed it inserted into a mysql database. I'm assuming some type of while loop would be needed here.

Last edited by unplugme71; 08-16-2015 at 01:30 PM..
# 2  
Old 08-16-2015
To create a pipe symbol (|) separated values file containing data from all of the files in a directory with names ending in .log, you could use the following:
Code:
awk '
BEGIN {	OFS = "|"
}
FNR == 1 {
	d = substr(FILENAME, 1, 10)
}
$2 ~ /^[AP]M$/ && $3 != "CPU" {
	print FNR, d, $1 " " $2, $4, $6, $9
}' *.log

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.

If there are files named 2015-08-14_hostname_cpu.log and 2015-08-15_hostname_cpu.log in a directory where you run the above script and each of those files contained the sample data shown in your 1st post in this thread, it produces the output:
Code:
4|2015-08-14|11:39:17 PM|0.09|0.07|99.75
5|2015-08-14|11:44:17 PM|0.04|0.03|99.92
6|2015-08-14|11:49:17 PM|1.49|0.49|97.96
7|2015-08-14|11:54:17 PM|23.27|0.51|76.14
10|2015-08-14|11:56:12 PM|0.17|0.13|99.69
4|2015-08-15|11:39:17 PM|0.09|0.07|99.75
5|2015-08-15|11:44:17 PM|0.04|0.03|99.92
6|2015-08-15|11:49:17 PM|1.49|0.49|97.96
7|2015-08-15|11:54:17 PM|23.27|0.51|76.14
10|2015-08-15|11:56:12 PM|0.17|0.13|99.69

I will leave it to you to redirect the output from the above script into a file you can use to load your database or pipe the output directly into a mysql statement to load your database.
This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 08-16-2015
IMO if you are only going to represent %system %user and %idle, you cannot simply discard the columns, you need to map the other columns by adding them to the relevant columns.

For example %iowait is idle time, while IO is happening, but it is still idle time, so you need to add it to idle time, otherwise the numbers do not add up to 100%.
In your sample the percentages are low, but there are situations where they may be significant.

I think you should use this:
Code:
%total_user = %user + %nice 
%total_idle = %iowait + %steal + %idle

So, adjusting Don's suggestion that would mean:
Code:
	print FNR, d, $1 " " $2, $4+$5, $6, $7+$8+$9

Which produces the output:
Code:
4|2015-08-14|11:39:17 PM|0.09|0.07|99.84
5|2015-08-14|11:44:17 PM|0.04|0.03|99.92
6|2015-08-14|11:49:17 PM|1.49|0.49|98.02
7|2015-08-14|11:54:17 PM|23.27|0.51|76.22
10|2015-08-14|11:56:12 PM|0.17|0.13|99.71
4|2015-08-15|11:39:17 PM|0.09|0.07|99.84
5|2015-08-15|11:44:17 PM|0.04|0.03|99.92
6|2015-08-15|11:49:17 PM|1.49|0.49|98.02
7|2015-08-15|11:54:17 PM|23.27|0.51|76.22
10|2015-08-15|11:56:12 PM|0.17|0.13|99.71

This User Gave Thanks to Scrutinizer For This Post:
# 4  
Old 08-16-2015
Thank you both for your replies. The OS i'm using is Debian 8 x64.

Quote:
Originally Posted by Scrutinizer
IMO if you are only going to represent %system %user and %idle, you cannot simply discard the columns, you need to map the other columns by adding them to the relevant columns.

For example %iowait is idle time, while IO is happening, but it is still idle time, so you need to add it to idle time, otherwise the numbers do not add up to 100%.
In your sample the percentages are low, but there are situations where they may be significant.

I think you should use this:
Code:
%total_user = %user + %nice 
%total_idle = %iowait + %steal + %idle

So, adjusting Don's suggestion that would mean:
Code:
	print FNR, d, $1 " " $2, $4+$5, $6, $7+$8+$9

Which produces the output:
Code:
4|2015-08-14|11:39:17 PM|0.09|0.07|99.84
5|2015-08-14|11:44:17 PM|0.04|0.03|99.92
6|2015-08-14|11:49:17 PM|1.49|0.49|98.02
7|2015-08-14|11:54:17 PM|23.27|0.51|76.22
10|2015-08-14|11:56:12 PM|0.17|0.13|99.71
4|2015-08-15|11:39:17 PM|0.09|0.07|99.84
5|2015-08-15|11:44:17 PM|0.04|0.03|99.92
6|2015-08-15|11:49:17 PM|1.49|0.49|98.02
7|2015-08-15|11:54:17 PM|23.27|0.51|76.22
10|2015-08-15|11:56:12 PM|0.17|0.13|99.71

Thank you again for this information. I was not aware about the other columns being part of the whole equation of cpu utilization.

---------- Post updated at 11:15 AM ---------- Previous update was at 11:09 AM ----------

Quote:
Originally Posted by Don Cragun
To create a pipe symbol (|) separated values file containing data from all of the files in a directory with names ending in .log, you could use the following:
Code:
awk '
BEGIN {	OFS = "|"
}
FNR == 1 {
	d = substr(FILENAME, 1, 10)
}
$2 ~ /^[AP]M$/ && $3 != "CPU" {
	print FNR, d, $1 " " $2, $4, $6, $9
}' *.log

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.

If there are files named 2015-08-14_hostname_cpu.log and 2015-08-15_hostname_cpu.log in a directory where you run the above script and each of those files contained the sample data shown in your 1st post in this thread, it produces the output:
Code:
4|2015-08-14|11:39:17 PM|0.09|0.07|99.75
5|2015-08-14|11:44:17 PM|0.04|0.03|99.92
6|2015-08-14|11:49:17 PM|1.49|0.49|97.96
7|2015-08-14|11:54:17 PM|23.27|0.51|76.14
10|2015-08-14|11:56:12 PM|0.17|0.13|99.69
4|2015-08-15|11:39:17 PM|0.09|0.07|99.75
5|2015-08-15|11:44:17 PM|0.04|0.03|99.92
6|2015-08-15|11:49:17 PM|1.49|0.49|97.96
7|2015-08-15|11:54:17 PM|23.27|0.51|76.14
10|2015-08-15|11:56:12 PM|0.17|0.13|99.69

I will leave it to you to redirect the output from the above script into a file you can use to load your database or pipe the output directly into a mysql statement to load your database.
There will be other log files in this directory that are for another purpose, could I use *_cpu.log instead? Is there any way to convert the time to a 24 hr clock that MySQL understands? I'm assuming I would need to trim the AM or PM off and if it was PM add 12:00:00 somehow?

I have created a script that processes the output and modified it a little bit. I just need to fix the time and I should be good to go. Here's the current output.

Code:
2015-08-15,11:39:17PM,0.09,0.07,99.84
2015-08-15,11:44:17PM,0.04,0.03,99.92
2015-08-15,11:49:17PM,1.49,0.49,98.02
2015-08-15,11:54:17PM,23.27,0.51,76.22
2015-08-15,11:56:12PM,0.17,0.13,99.71


Edit: I forgot to add the hostname column. How can I add that into the awk. The hostname can be various length, but will always be between the date_ and _filename

Last edited by unplugme71; 08-16-2015 at 01:31 PM..
# 5  
Old 08-16-2015
Hi, modifying Don's suggestion, you could try including the hostname with something like this:
Code:
awk '
  BEGIN {
    OFS = "|"
  }
  FNR == 1 {
    split(FILENAME,F,/_/)
    d=F[1]
    h=F[2]
  }
  $2 ~ /^[AP]M$/ && $3 != "CPU" {
    print FNR, d, h, $1 " " $2, $4+$5, $6, $7+$8+$9
  }
' *_cpu.log

This User Gave Thanks to Scrutinizer For This Post:
# 6  
Old 08-16-2015
Quote:
Originally Posted by Scrutinizer
Hi, modifying Don's suggestion, you could try including the hostname with something like this:
Code:
awk '
  BEGIN {
    OFS = "|"
  }
  FNR == 1 {
    split(FILENAME,F,/_/)
    d=F[1]
    h=F[2]
  }
  $2 ~ /^[AP]M$/ && $3 != "CPU" {
    print FNR, d, h, $1 " " $2, $4+$5, $6, $7+$8+$9
  }
' *_cpu.log

Awesome! That worked.

---------- Post updated at 12:38 PM ---------- Previous update was at 12:06 PM ----------

What about time conversion from 12 to 24hr?
# 7  
Old 08-16-2015
So, using comma instead of the pipe symbol, and getting rid of the space between the time stamp and the "AM" or "PM", we have:
Code:
awk '
BEGIN {	OFS = ","
}
FNR == 1 {
	split(FILENAME, F, /_/)
	d=F[1]
	h=F[2]
}
$2 ~ /^[AP]M$/ && $3 != "CPU" {
	print FNR, d, h, $1 $2, $4+$5, $6, $7+$8+$9
}' *_cpu.log

and, if you want a 24 hour clock time instead of AM/PM notation, try:
Code:
awk '
BEGIN {	OFS = ","
}
FNR == 1 {
	split(FILENAME, F, /_/)
	d=F[1]
	h=F[2]
}
$2 ~ /^[AP]M$/ && $3 != "CPU" {
	split($1, HMS, /:/)
	if(HMS[1] == 12)HMS[1] = 0
	if($2 == "PM")	HMS[1] += 12
	t = sprintf("%02d:%s:%s", HMS[1], HMS[2], HMS[3])
	print FNR, d, h, t, $4+$5, $6, $7+$8+$9
}' *_cpu.log

This User Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Read latest log files and perform database insert

Hi Experts, I have a situation where I need to write a shell script to continuously monitor a log directory with multiple log files and perform following: 1. Read the latest log file continuously and grep "Success" OR "Failure" 2. As it capture either Success or Failure, it has to perform a... (1 Reply)
Discussion started by: rish_max
1 Replies

2. Shell Programming and Scripting

How to read a text file line by line and insert into a database table?

I have a test file that I want to read and insert only certain lines into the the table based on a filter. 1. Rread the log file 12 Hours back Getdate() -12 Hours 2. Extract the following information on for lines that say "DUMP is complete" A. Date B. Database Name C.... (2 Replies)
Discussion started by: JolietJake
2 Replies

3. Shell Programming and Scripting

Parse through ~21,000 Database DDL statements -- Fastest way to perform search, replace and insert

Hello All: We are looking to search through 2000 files with around 21,000 statements where we have to search, replace and insert a pattern based on the following: 1) Parse through the file and check for CREATE MULTISET TABLE or CREATE SET TABLE statements.....and they always end with ON... (5 Replies)
Discussion started by: madhunk
5 Replies

4. Shell Programming and Scripting

Korn shell program to parse CSV text file and insert values into Oracle database

Enclosed is comma separated text file. I need to write a korn shell program that will parse the text file and insert the values into Oracle database. I need to write the korn shell program on Red Hat Enterprise Linux server. Oracle database is 10g. (15 Replies)
Discussion started by: shellguy
15 Replies

5. Web Development

INSERT data to a Database Table from a text file

If you have a text file and if you want to Insert data to your Database Table, You can do it with these queries LOAD DATA LOCAL INFILE '/path/yourTextFile.txt' INTO TABLE yourTableName FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY '\n' (0 Replies)
Discussion started by: sitex
0 Replies

6. Shell Programming and Scripting

To Read a File and Insert a part of the lines into the database

Hi Guys I need to have a shell script which reads a log file and insert a part of each line into the database. Some sample lines in the file are as below. 20091112091359 MED_RQACK : user_data=60173054304,100232120,20091112091359,;ask_status=0;ask_reason=OK;msg_id=20091112091319... (5 Replies)
Discussion started by: Somanadh
5 Replies

7. Shell Programming and Scripting

Shell script to parse a line and insert a word

Hi All, I have a file like this, data1,data2,,,data5,data6. i want to write a shell script to replace data3 with "/example/string". which means my data file should look like this . data1,data2,example/string],,data5,data6. Could you guys help me to get a sed command or any other command... (8 Replies)
Discussion started by: girish.raos
8 Replies

8. Shell Programming and Scripting

how to insert data into database by reading it from a text file??

Hi....can you guys help me out in this script?? Below is a text file and it contains these: GEF001 000093625 MKL002510 000001 000000 000000 000000 000000 000000 000001 GEF001 000093625 MKL003604 000001 000000 000000 000000 000000 000000 000001 GEF001 000093625 MKL005675 000001... (4 Replies)
Discussion started by: pallavishetty
4 Replies

9. Shell Programming and Scripting

How to insert data into MYSql database from a text file

Hi, Need to get help from you guys about this issue. I need to insert data into MySql database from a text file which is located in other server. The text file is something look like below: Date | SubscriberNo | Call Duration 20/7/07 | 123456788 | 20 20/7/07 | 123412344 | 30 The... (4 Replies)
Discussion started by: shirleyeow
4 Replies

10. Shell Programming and Scripting

how to insert data in database based on text file?

Hi....can you guys help me out in this script?? Below is a text file script....called Bukom.txt and it contains these: BUKOM 20060101 2.5 2.6 2.7 2.8 2.9 2.3 2.1 BUKOM 20060102 2.4 2.5 2.6 2.7 2.7 2.6 2.4 BUKOM 20060103 2.1 ... (9 Replies)
Discussion started by: forevercalz
9 Replies
Login or Register to Ask a Question