Split, Search and Reformat by Data Group


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Split, Search and Reformat by Data Group
# 1  
Old 02-14-2011
Split, Search and Reformat by Data Group

Hi,

I am writing just to share my appreciation for help I have received from this site in the past.

In a previous post Split File by Data Group I received a lot of help with a troublesome awk script to reformat some complicated data blocks. What I learned really came in hand recently when I needed to parse some data from my cluster management logs. We use LSF Batch System from Platform Computing to manage job execution. The logs from this system look like the following:

Code:

Job <302735>, Job Name <SOME_JOB_NAME_HERE>, User <hsimpson>, Proj
                     ect <default>, Command </path/to/job.sh -sysparm "254 CR66
                     " -log /path/to/logs/log.log -print /path/to/print.txt>
Tue Aug 24 06:56:48: Submitted from host <computer1>, to Queue <Queue_1>, CWD </
                     path/to/scripts>, Re-runnable;
Tue Aug 24 06:56:50: Dispatched to <myserver.com>;
Tue Aug 24 06:56:50: Starting (Pid 22582);
Tue Aug 24 06:56:50: Running with execution home </home/hsimpson>, Execution
                     CWD </path/to/scripts>, Execution Pid <22582>;
Tue Aug 24 06:56:52: Exited with exit code 1. The CPU time used is 0.5 seconds;


Summary of time in seconds spent in various states by  Tue Aug 24 06:56:52
  PEND     PSUSP    RUN      USUSP    SSUSP    UNKWN    TOTAL
  2        0        2        0        0        0        4
------------------------------------------------------------------------------

Job <302742>, Job Name <THIS_IS_A_JOB_NAME_HERE>, User <hsimpson
                     >, Project <default>, Command </path/to/job.sh -sysparm 
                     "254 CR66" -log /path/to/logs/log.log -print /path/to/
                     print.txt>
Tue Aug 24 06:57:59: Submitted from host <server4>, to Queue <Queue_2>, CWD </
                     path/to/scripts>, Re-runnable;
Tue Aug 24 06:58:05: Dispatched to <myserver.com>;
Tue Aug 24 06:58:05: Starting (Pid 28265);
Tue Aug 24 06:58:05: Running with execution home </home/hsimpson>, Execution
                     CWD </path/to/scripts>, Execution Pid <28265>;
Tue Aug 24 14:14:42: Done successfully. The CPU time used is 3.9 seconds;
Tue Aug 24 14:14:42: Post job process done successfully;

Summary of time in seconds spent in various states by  Tue Aug 24 14:14:42
  PEND     PSUSP    RUN      USUSP    SSUSP    UNKWN    TOTAL
  6        0        26197    0        0        0        26203
------------------------------------------------------------------------------

Job <302845>, Job Name <JOB_NAME_GOES_HERE_HERE_HERE>, User <hsimpson>, Proj
                     ect <default>, Command </path/to/job.sh -sysparm "254 CR66
                     " -log /path/to/logs/log.log -print /path/to/print.txt>
Tue Aug 24 08:16:07: Submitted from host <myserver.com>, to Queue
                     <Queue_3>, CWD </tmp>, Re-runnable;
Tue Aug 24 08:16:11: Dispatched to <myserver.com>;
Tue Aug 24 08:16:11: Starting (Pid 11867);
Tue Aug 24 08:16:11: Running with execution home </home/hsimpson>, Execution CWD <
                     /tmp>, Execution Pid <11867>;
Tue Aug 24 08:38:50: Exited with exit code 1. The CPU time used is 0.6 seconds;


Summary of time in seconds spent in various states by  Tue Aug 24 08:38:50
  PEND     PSUSP    RUN      USUSP    SSUSP    UNKWN    TOTAL
  4        0        1359     0        0        0        1363



From each of these blocks of log information there are only certain items that interest me, and I want to be able to import those into excel to build stupid pretty charts from.

This is what I want in my output:

Code:
302735,Tue Aug 24 06:56:48,Tue Aug 24 06:56:52,2,2,4,SOME,JOB
302742,Tue Aug 24 06:57:59,Tue Aug 24 14:14:42,6,26197,26203,THIS,IS
302845,Tue Aug 24 08:16:07,Tue Aug 24 08:38:50,4,1359,1363,JOB,NAME


This is the code I used to generate the desired output

Code:
#AWK

function p()          { if(jobno!="") { print jobno "," submit "," done "," pend "," run "," total "," jobname1 "," jobname2 } 
}

/^Job/                { jobno=substr($0,6,6); 
                        split($0,a,","); 
                        split(a[2],b,"_"); 
                        jobname1=b[1]; 
                        jobname2=b[2]
}                            
                                                                           
/Submitted/           { submit=substr($0,1,19)
}                            
                            
/^Summary/            { done=substr($0,56,19)
}
                            
/PEND/                { getline; 
                        pend=$1; 
                        run=$3; 
                        total=$NF; 
                        p()
}

So once again, thank you to everyone on this site for your continued help in my continued learning efforts.

Sincerely,
Matthew
# 2  
Old 02-14-2011
Code:
awk '/^Job/ {printf substr($2,2,length($2)-3)",";x=substr($5,2);y=$6}
/Submitted from host/{printf $1" "$2" "$3" "substr($4,1,length($4)-1)","}
/^Summary of time/ {printf $(NF-3)" "$(NF-2)" "$(NF-1)" "$NF","}
/^ +[0-9]/{printf $3","$19","$NF","x","y"\n"}' FS=" |_" urfile
302735,Tue Aug 24 06:56:48,Tue Aug 24 06:56:52,2,2,4,SOME,JOB
302742,Tue Aug 24 06:57:59,Tue Aug 24 14:14:42,6,26197,26203,THIS,IS
302845,Tue Aug 24 08:16:07,Tue Aug 24 08:38:50,4,1359,1363,JOB,NAME

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Use search pattern to reformat the output

I have below file listing ] ls -1 *.txt MISTradesReport_141105_d130240_VOLCKER_EMEA_LOANIQ_FEED_2013-12-24.txt MISTradesReport_141106_d130240_VOLCKER_NA_LOANIQ_FEED_2013-12-24.txt MISTradesReport_141107_d130240_VOLCKER_EMEA_CDS_CRDI_FEED_2013-12-24.txt... (4 Replies)
Discussion started by: krg.sati
4 Replies

2. Shell Programming and Scripting

Help with reformat data set

Input file 4CL1 O24145 CoA1 4CL1 P31684 CoA1 4CL1 Q54P77 CoA_1 73 O36421 Unknown 4CL3 Q9S777 coumarate 4CL3 Q54P79 coumarate 4CL3 QP7932 coumarate Desired output result 4CL1 O24145#P31684 CoA1 4CL1 Q54P77 CoA_1 73 O36421 Unknown 4CL3 Q9S777#Q54P79#QP7932 coumarate I... (5 Replies)
Discussion started by: perl_beginner
5 Replies

3. Shell Programming and Scripting

Help with reformat data structure

Input file: bv|111259484|pir||T49736_real_data bv|159484|pir||T9736_data_figure bv|113584|prf|T4736|truth bv|113584|pir||T4736_truth Desired output: bv|111259484|pir|T49736|real_data bv|159484|pir|T9736|data_figure bv|113584|prf|T4736|truth bv|113584|pir|T4736|truth Once the... (8 Replies)
Discussion started by: perl_beginner
8 Replies

4. Shell Programming and Scripting

need a one liner to grep a group info from /etc/group and use that result to search passwd file

/etc/group tiadm::345:mk789,po312,jo343,ju454,ko453,yx879,iy345,hn453 bin::2:root,daemon sys::3:root,bin,adm adm::4:root,daemon uucp::5:root /etc/passwd mk789:x:234:1::/export/home/dummy:/bin/sh po312:x:234:1::/export/home/dummy:/bin/sh ju454:x:234:1::/export/home/dummy:/bin/sh... (6 Replies)
Discussion started by: chidori
6 Replies

5. Shell Programming and Scripting

Help with reformat data content

input file: hsa-miR-4726-5p Score hsa-miR-483-5p Score hsa-miR-125b-2* Score hsa-miR-4492 hsa-miR-4508 hsa-miR-4486 Score Desired output file: hsa-miR-4726-5p Score hsa-miR-483-5p Score hsa-miR-125b-2* Score hsa-miR-4492 hsa-miR-4508 hsa-miR-4486 Score ... (6 Replies)
Discussion started by: perl_beginner
6 Replies

6. Shell Programming and Scripting

Reformat the data of a file.

I have a file which have data like A.txt a 1Jan I am in a1. 1Jan I was born. 2Jan I am here. 3Jan I am in a3. b 1Jan I am in b1. c 2Jan I am in c2. d 2Jan I am in d2. 5jan I am in d5. date in the file might be vary evertime. (9 Replies)
Discussion started by: samkhu
9 Replies

7. Shell Programming and Scripting

Group search (multiple data points) in Linux

Hi All I have a data set like this tab delimited: weft fgr-1 345 -1 fgrythdgd weft fgr-3 456 -2 ghjdklflllff weft fgr-11 456 -3 ghtjuffl weft fgr-1 213 -2 ghtyjdkl weft fgr-34 567 -5 fghytkflf frgt fgr-36 567 -1 ghrjufjf frgt fgr-45 678 -2 ghjruir frgt fgr-34 546 -5 gjjjgkldlld frgt... (4 Replies)
Discussion started by: Lucky Ali
4 Replies

8. Shell Programming and Scripting

Split file by data group

Hi all, I'm having a little trouble solving a file split I need to get done. I have the following data: 1. Light 1A. Light Soft texture: it's soft color: the color value is that of something light vital statistics: srm: 23 og: 1.035 sp: 1.065 comment: this is nice if you like... (8 Replies)
Discussion started by: mkastin
8 Replies

9. Shell Programming and Scripting

Reformat Data (Perl)

I am new to Perl. I need to reformat a data file as the last part of a script I am working on. I am stuck on this. Here is the current format: CUSTOMER Filename 09/04/07-08:49 CUSTOMER Filename 09/04/07-08:52 CUSTOMER Filename 09/04/07-08:52 CUSTOMER2 Filename 09/04/07-08:49 CUSTOMER2... (3 Replies)
Discussion started by: flood
3 Replies

10. Shell Programming and Scripting

help reformat data with awk

I am trying to write an awk program to reformat a data table and convert the date to julian time. I have all the individual steps working, but I am having some issues joing them into one program. Can anyone help me out? Here is my code so far: # This is an awk program to convert the dates from... (4 Replies)
Discussion started by: climbak
4 Replies
Login or Register to Ask a Question