Using awk to Parse File


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Using awk to Parse File
# 1  
Old 03-26-2014
Using awk to Parse File

Hi all, I have a file that contains a good hundred of these job definitions below:

Code:
Job Name                                                         Last Start           Last End             ST Run     Pri/Xit
________________________________________________________________ ____________________ ____________________ __ _______ ___
B9043CC_APP_DMLD_025_FR_xpabbdu1_D                               03/12/2014 18:21:32  03/12/2014 18:22:07  SU 49744331/3

  Status/[Event]  Time                 Ntry ES  ProcessTime           Machine
  --------------  --------------------- --  --  --------------------- ----------------------------------------
  [FORCE_STARTJOB]  03/12/2014 17:30:52    0  PD  03/12/2014 17:30:53
    < >
  STARTING        03/12/2014 17:30:53    1  PD  03/12/2014 17:30:53   machine.enviorment.net
  RUNNING         03/12/2014 17:31:06    1  PD  03/12/2014 17:31:07   machine.enviorment.net
  SUCCESS         03/12/2014 17:31:46    1  PD  03/12/2014 17:31:47
  [FORCE_STARTJOB]  03/12/2014 18:16:06    0  PD  03/12/2014 18:16:07
    < >
  STARTING        03/12/2014 18:16:07    2  PD  03/12/2014 18:16:07   machine.enviorment.net
  RUNNING         03/12/2014 18:16:19    2  PD  03/12/2014 18:16:20   machine.enviorment.net
  FAILURE         03/12/2014 18:17:02    2  PD  03/12/2014 18:17:03
  [*** ALARM ***]
    JOBFAILURE    03/12/2014 18:17:03    2  PD  03/12/2014 18:17:04
  [FORCE_STARTJOB]  03/12/2014 18:21:18    0  PD  03/12/2014 18:21:19
    < >
  STARTING        03/12/2014 18:21:19    3  PD  03/12/2014 18:21:19   machine.enviorment.net
  RUNNING         03/12/2014 18:21:32    3  PD  03/12/2014 18:21:32   machine.enviorment.net
  SUCCESS         03/12/2014 18:22:07    3  PD  03/12/2014 18:22:08

The actual start/end times & actaul start/end dates are coming from the "Process time" column.I only want the data above and don't want any of the text including the "----" to be anywhere in the file I output it to. As mentioned above, I have a few hundred of these definitions in a single file.
This was something I was originally doing in python and am now going to try to do it using awk.

I know to read in the file it would be:

Code:
awk /dir/filepath/input.txt

And it output the file, I need:

Code:
System Number  Job Name                           Target Machiene    Status     Actual Start Date     Actual Start Time      Actual End Date    Actual End Time
9043           B9043CC_APP_DMLD_025_FR_xpabbdu1_D machine.enviorment.net    SUCCESS       03/12/2014               17:30:53            03/12/2014         17:31:47
9043           B9043CC_APP_DMLD_025_FR_xpabbdu1_D machine.enviorment.net    FAILURE       03/12/2014               18:16:07            03/12/2014         18:17:03
9043           B9043CC_APP_DMLD_025_FR_xpabbdu1_D machine.enviorment.net    SUCCESS       03/12/2014               18:21:19            03/12/2014         18:22:08

Code:
> /dir/filepath/output.txt

However, I'm looking for help with regards to the parsing aspect.

Last edited by atticuss; 03-28-2014 at 12:12 PM.. Reason: code tags not HTML...
# 2  
Old 03-26-2014
And where are we going to find the system number?
# 3  
Old 03-26-2014
Quote:
Originally Posted by vbe
And where are we going to find the system number?
Under the "Job Name Column"
Code:
Job Name                                                         Last Start           Last End             ST Run     Pri/Xit
________________________________________________________________ ____________________ ____________________ __ _______ ___
B9043CC_APP_DMLD_025_FR_xpabbdu1_D                               03/12/2014 18:21:32  03/12/2014 18:22:07  SU 49744331/3

# 4  
Old 03-26-2014
Can you tell us a bit more about the input file format:
I understand this is an extract, does it correspond to a specific job log, or we might find the same job later? etc...
Anything to clear how the parsing will work:
e.g.
Will we have to look 3rd line after we find "^Job Name " to find the string containing the System Number ( will always be the case...)?
# 5  
Old 03-26-2014
Quote:
Originally Posted by vbe
Can you tell us a bit more about the input file format:
I understand this is an extract, does it correspond to a specific job log, or we might find the same job later? etc...
Anything to clear how the parsing will work:
e.g.
Will we have to look 3rd line after we find "^Job Name " to find the string containing the System Number ( will always be the case...)?
Sure!

Here is the general logic:

Code:
Read six lines (header)
Get system number and batch name

Until end of file:
    Read five lines
    Get machine name, status, start and end dates and times
    If status is FAILURE
        Read two lines (clear error message)

No, duplicate job names will be present, however jobs will contain the same system numbers.

Also, since some jobs may have have ran on that specific day so there will be no data in them. In this case the fields in the output file would just be empty or null.

I.E

Code:
Job Name                     Last Start           Last End             ST Run     Pri/tit
____________________________ ____________________ ____________________ __ _______ ___
B9043CC_uwsprem_l_thd013sv_D 08/04/2010 22:03:55  03/05/2012 07:51:33  OI 22333537/0    

  Status/[Event]  Time                 Ntry ES  ProcessTime           Machine
  --------------  --------------------- --  --  --------------------- -------

B9043CC_uwsprem_l_thd024sv_D 03/06/2012 22:00:34  03/06/2012 22:00:42  OI 22333536/1    

  Status/[Event]  Time                 Ntry ES  ProcessTime           Machine
  --------------  --------------------- --  --  --------------------- -------

B9043BC_bond_ba_mf_loss_thd013sv_D                               03/06/2012 08:54:11  03/06/2012 11:44:06  OI 22303721/1    

  Status/[Event]  Time                 Ntry ES  ProcessTime           Machine
  --------------  --------------------- --  --  --------------------- ----------------------------------------
  [STARTJOB]      03/19/2014 17:45:00    0  PD  03/19/2014 17:45:00   
    <Event was Scheduled based on Job Definition.>

 B9043CC_bcmsloss_l_thd013sv_D                                   03/21/2014 08:46:48  03/21/2014 10:38:31  SU 22303721/110    

   Status/[Event]  Time                 Ntry ES  ProcessTime           Machine
   --------------  --------------------- --  --  --------------------- ----------------------------------------
   SUCCESS         03/19/2014 14:04:49   108  PD  03/19/2014 14:04:49   
   [FORCE_STARTJOB]  03/20/2014 13:39:15    0  PD  03/20/2014 13:39:15   
     < >
   STARTING        03/20/2014 13:39:15   109  PD  03/20/2014 13:39:16   machine.enviorment.net
   RUNNING         03/20/2014 13:39:17   109  PD  03/20/2014 13:39:17   machine.enviorment.net
   SUCCESS         03/20/2014 14:24:56   109  PD  03/20/2014 14:24:56   
   [FORCE_STARTJOB]  03/21/2014 08:46:47    0  PD  03/21/2014 08:46:47   
     < >
   STARTING        03/21/2014 08:46:47   110  PD  03/21/2014 08:46:48   tmachine.enviorment.net
   RUNNING         03/21/2014 08:46:48   110  PD  03/21/2014 08:46:49   machine.enviorment.net
   SUCCESS         03/21/2014 10:38:31   110  PD  03/21/2014 10:38:31


Last edited by atticuss; 03-28-2014 at 12:07 PM..
# 6  
Old 03-26-2014
... based off your original data ...
Code:
gawk '
	BEGIN {
		printf("%-14s %-65s %-41s %-8s %-18s %-18s %-16s %-16s\n",
		"System Number","Job Name","Target Machine","Status","Actual Start Date",
		"Actual Start Time","Actual End Date","Actual End Time") }
	$0~/^[A-Z]/ {
		match($1,/([0-9]+)/,s); s[2]=$1 }
	$1=="STARTING" {
		if(!s[3]) s[3]=$8; s[5]=$2; s[6]=$3 }
	$1~/^(SUCCESS|FAILURE)$/ {
		printf("%-14s %-65s %-41s %-8s %-18s %-18s %-16s %-16s\n",
		s[1], s[2], s[3], $1, s[5], s[6], $2, $3) }
' your_file

# 7  
Old 03-27-2014
Quote:
Originally Posted by jethrow
... based off your original data ...
Code:
gawk '
	BEGIN {
		printf("%-14s %-65s %-41s %-8s %-18s %-18s %-16s %-16s\n",
		"System Number","Job Name","Target Machine","Status","Actual Start Date",
		"Actual Start Time","Actual End Date","Actual End Time") }
	$0~/^[A-Z]/ {
		match($1,/([0-9]+)/,s); s[2]=$1 }
	$1=="STARTING" {
		if(!s[3]) s[3]=$8; s[5]=$2; s[6]=$3 }
	$1~/^(SUCCESS|FAILURE)$/ {
		printf("%-14s %-65s %-41s %-8s %-18s %-18s %-16s %-16s\n",
		s[1], s[2], s[3], $1, s[5], s[6], $2, $3) }
' your_file


Thanks! I ust ran the script against the data below. I have multiple of these jobs in one file, so every job has a different job name which I want to grab, even if the job did not run. It is my fault for not mentioning this in the original post. I just ran the script against the data below and it only pulling the first job name it sees for each entry, am I am trying to modify that.


Code:
B3709BC_GCFCT_MONTHLY_tpabbtu1_D                                 03/12/2014 09:13:23  03/13/2014 00:43:10  FA 54759595/1 1  

  Status/[Event]  Time                 Ntry ES  ProcessTime           Machine
  --------------  --------------------- --  --  --------------------- ----------------------------------------
  RUNNING         03/12/2014 09:13:23    1  PD  03/12/2014 09:13:24    
  FAILURE         03/13/2014 00:43:10    1  PD  03/13/2014 00:43:11   
  [STARTJOB]      03/26/2014 18:45:00    0  UP                        
    <Event was Scheduled based on Job Definition.>

 B3709CC_GCFCT_MONTHLY_VALIDATION_tpabbtu1_D                     03/12/2014 10:59:52  03/12/2014 11:01:11  SU 54759595/1    

   Status/[Event]  Time                 Ntry ES  ProcessTime           Machine
   --------------  --------------------- --  --  --------------------- ----------------------------------------
   [FORCE_STARTJOB]  03/12/2014 10:59:46    0  PD  03/12/2014 10:59:46   
     < >
   STARTING        03/12/2014 10:59:46    1  PD  03/12/2014 10:59:46   machine.enviorment.net
   RUNNING         03/12/2014 10:59:52    1  PD  03/12/2014 10:59:52    machine.enviorment.net
   SUCCESS         03/12/2014 11:01:11    1  PD  03/12/2014 11:01:11   

 B3709CC_GCFCT_Monthly_LKUP_Creation_tpabbtu1_D                  03/12/2014 10:24:43  03/12/2014 10:27:57  SU 54759595/1    

   Status/[Event]  Time                 Ntry ES  ProcessTime           Machine
   --------------  --------------------- --  --  --------------------- ----------------------------------------
   [FORCE_STARTJOB]  03/12/2014 10:24:37    0  PD  03/12/2014 10:24:37   
     < >
   STARTING        03/12/2014 10:24:37    1  PD  03/12/2014 10:24:38  machine.enviorment.net
   RUNNING         03/12/2014 10:24:43    1  PD  03/12/2014 10:24:44   machine.enviorment.net
   SUCCESS         03/12/2014 10:27:57    1  PD  03/12/2014 10:27:58   

 B3709CC_GCFCT_IP_Target_Load_tpabbtu1_D                         04/11/2013 15:42:10  04/11/2013 15:45:31  IN 39115173/0    

   Status/[Event]  Time                 Ntry ES  ProcessTime           Machine
   --------------  --------------------- --  --  --------------------- ----------------------------------------

 B3709CC_GCFCT_ERROR_PROCESSING_tpabbtu1_D                       04/11/2013 15:45:41  04/11/2013 16:45:42  IN 39115173/0    

   Status/[Event]  Time                 Ntry ES  ProcessTime           Machine
   --------------  --------------------- --  --  --------------------- ----------------------------------------

output:
Code:
System Number  Job Name                                                          Target Machine                            Status   Actual Start Date  Actual Start Time  Actual End Date  Actual End Time 
3709           B3709BC_GCFCT_MONTHLY_tpabbtu1_D                                                                            FAILURE                                        03/13/2014       00:43:10        
3709           B3709BC_GCFCT_MONTHLY_tpabbtu1_D                                  machine.enviorment.net     SUCCESS  03/12/2014         10:59:46           03/12/2014       11:01:11        
3709           B3709BC_GCFCT_MONTHLY_tpabbtu1_D                                  machine.enviorment.net     SUCCESS  03/12/2014         10:24:37           03/12/2014       10:27:57

Targetd output:

Code:
System Number  Job Name                                                          Target Machine                            Status   Actual Start Date  Actual Start Time  Actual End Date  Actual End Time 
3709           B3709BC_GCFCT_MONTHLY_tpabbtu1_D                                                                            FAILURE                                        03/13/2014       00:43:10        
3709           B3709CC_GCFCT_MONTHLY_VALIDATION_tpabbtu1_D                     machine.enviorment.net     SUCCESS  03/12/2014         10:59:46           03/12/2014       11:01:11        
3709           B3709CC_GCFCT_Monthly_LKUP_Creation_tpabbtu1_D                  machine.enviorment.net     SUCCESS  03/12/2014         10:24:37           03/12/2014       10:27:57        
3709           B3709CC_GCFCT_IP_Target_Load_tpabbtu1_
3709           B3709CC_GCFCT_ERROR_PROCESSING_tpabbtu1_D

Thank you thus far, jethrow!

Last edited by atticuss; 03-28-2014 at 12:11 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to parse this file using awk and output in CSV format?

My source file looks like this: Cust-Number = "101" Cust-Name="Joe" Cust-Town="London" Cust-hobby="tennis" Cust-purchase="200" Cust-Number = "102" Cust-Name="Mary" Cust-Town="Newyork" Cust-hobby="reading" Cust-purchase="125" Now I want to parse this file (leaving out hobby) and... (10 Replies)
Discussion started by: Balav
10 Replies

2. Shell Programming and Scripting

awk parse result that match data from file

i run command that return this result,example : gigabitethernet2/2/4:NotPresent, gigabitethernet2/1/17:UP, gigabitethernet2/1/10:UP, gigabitethernet2/1/5:UP, gigabitethernet2/1/9:UP, gigabitethernet2/1/36:DOWN, gigabitethernet2/1/33:DOWN, gigabitethernet2/1/8:UP,... (19 Replies)
Discussion started by: wanttolearn1
19 Replies

3. Shell Programming and Scripting

awk or perl to parse file

I have an input file attached that I am trying to parse in tab-delimanted format: The chromosomal variant column contains all the information: parse rules: 1. 4 zeros after the NC_ and the digits before the . 2. digits after the g. repeated twice separated by a tab 3. letter before the > 4.... (10 Replies)
Discussion started by: cmccabe
10 Replies

4. Shell Programming and Scripting

awk to parse html file

Is it possible in awk to parse a webpage (EDAR Gene Sequencing - Genetic Testing Company | The DNA Diagnostic Experts | GeneDx), the source code is attached. <title> EDAR Gene Sequencing <dt>Test Code:</dt> <dd>156 </dd> <dt>Turnaround Time:</dt> <dd>6-8 weeks </dd> ... (4 Replies)
Discussion started by: cmccabe
4 Replies

5. Shell Programming and Scripting

Parse a file using awk

Hi Experts, I am trying to parse the following file; FILEA a|b|c|c|c|c a|b|d|d|d|d e|f|a|a|a|a e|f|b|b|b|boutput expected: a<TAB>b <TAB><TAB>c<TAB>c<TAB>c<TAB>c<TAB> <TAB><TAB>d<TAB>d<TAB>d<TAB>d<TAB> e<TAB>f <TAB><TAB>a<TAB>a<TAB>a<TAB>a<TAB> <TAB><TAB>b<TAB>b<TAB>b<TAB>b<TAB>*... (7 Replies)
Discussion started by: rajangupta2387
7 Replies

6. Shell Programming and Scripting

AWK script to parse a data in a file

Hi Unix gurus.. I have a file which has below data, It has several MQ Queue statistics; QueueName= 'TEST1' CreateDate= '2009-10-30' CreateTime= '13.45.40' QueueType= Predefined QueueDefinitionType= Local QMinDepth= 0 QMaxDepth= 0 QueueName= 'TEST2' CreateDate= '2009-10-30'... (6 Replies)
Discussion started by: dd_psg
6 Replies

7. Shell Programming and Scripting

AWK Command parse a file based on string.

AWK Command parse a file based on string. I am trying to write a shell script to parse a file based on a string and move the content of the file to another file. Here is scenario. File content below Mime-Version: 1.0 Content-Type: multipart/mixed; ... (2 Replies)
Discussion started by: aakishore
2 Replies

8. Shell Programming and Scripting

Parse a file with awk?

Hi guys (and gals). I need some help. I'm running an IVR purely on Asterisk where I capture the DTMFs. After pulsing each DTMF I have Asterisk write to a file with whatever was dialed (mostly used for record-keeping) and at the end of the survey I write all variables in a single line to a... (2 Replies)
Discussion started by: tulf210
2 Replies

9. Shell Programming and Scripting

AWK - Parse a big file

INPUT SAMPLE Symmetrix ID : 000192601507 Masking View Name : TS00P22_13E_1 Last updated at : 05:10:18 AM on Tue Mar 22,2011 Initiator Group Name : 10000000c960b9cd Host Initiators { WWN : 10000000c960b9cd } Port Group Name :... (8 Replies)
Discussion started by: greycells
8 Replies

10. Shell Programming and Scripting

Parse file using awk and work in awk output

hi guys, i want to parse a file using public function, the file contain raw data in the below format i want to get the output like this to load it to Oracle DB MARWA1,BSS:26,1,3,0,0,0,0,0.00,22,22,22.00 MARWA2,BSS:26,1,3,0,0,0,0,0.00,22,22,22.00 this the file raw format: Number of... (6 Replies)
Discussion started by: dagigg
6 Replies
Login or Register to Ask a Question