Working with individual blocks of text using awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Working with individual blocks of text using awk
# 1  
Old 02-10-2011
Working with individual blocks of text using awk

Hi,

I am working with CVS log data and have some data as follows.

Code:
RCS file: /cvsroot/eclipse/org.eclipse.debug.core/core/org/eclipse/debug/core/IBreakpointListener.java,v
head: 1.14
branch:
locks: strict
access list:
keyword substitution: o
total revisions: 15;    selected revisions: 2
description:
----------------------------
revision 1.14
date: 2006-06-12 15:42:24 -0500;  author: darin;  state: Exp;  lines: +2 -2;
copyright updates
----------------------------
revision 1.13
date: 2006-05-16 09:34:00 -0500;  author: darin;  state: Exp;  lines: +1 -1;
javadoc spelling errors
=============================================================================
RCS file: /cvsroot/eclipse/org.eclipse.debug.core/core/org/eclipse/debug/core/IBreakpointManager.java,v
head: 1.36
branch:
locks: strict
access list:
keyword substitution: o
total revisions: 38;    selected revisions: 4
description:
----------------------------
revision 1.31
date: 2007-03-26 20:47:29 -0500;  author: darin;  state: Exp;  lines: +1 -1;  commitid: 61604608779a4567;
update copyrights
----------------------------
revision 1.30
date: 2007-01-17 09:01:45 -0600;  author: darin;  state: Exp;  lines: +3 -2;  commitid: 614345ae3a564567;
javadoc settings and fixes
----------------------------
revision 1.29
date: 2006-06-12 15:42:24 -0500;  author: darin;  state: Exp;  lines: +2 -2;
copyright updates
----------------------------
revision 1.28
date: 2006-05-16 09:34:00 -0500;  author: darin;  state: Exp;  lines: +1 -1;
javadoc spelling errors
=============================================================================
RCS file: /cvsroot/eclipse/org.eclipse.debug.core/core/org/eclipse/debug/core/IBreakpointManagerListener.java,v
head: 1.6
branch:
locks: strict
access list:
keyword substitution: kv
total revisions: 6;    selected revisions: 1
description:
----------------------------
revision 1.4
date: 2005-02-23 23:58:22 -0600;  author: darins;  state: Exp;  lines: +1 -1;
CPL --> EPL

A block starts with the word RCS and ends with the pattern ======
I used the following command

awk '/.java,v/,/====/'

and it extracted all the blocks of data for java files i.e. starting and ending with the pattern ======. This is good.

However, I also want to extract some more information from each block and store that also.
For example, I want to count how many revisions are there in each block, how many distinct authors worked on that file, how many lines added/deleted in total for each file, etc.

Can anyone help me out how to extract this information from each block and store that in a tab separated file? Even if I do not get the values of the individual revisions/author names, etc. it is ok. I just want to get the count for revisions (total or the sum of lines added, etc).

Any starting help on even how to work with these individual blocks will be useful. Do I have to use some for loop to work with each block?

Thanks,
Sandeep
# 2  
Old 02-10-2011
Code:
awk -F\; '
/.java,v/{file=$0}
/^total revisions:/ {revisions=$1}
/author:/ {author=$2;split($4,a," ");add+=int(a[2]);del+=int(a[3])}
/======/ {print file RS revisions RS author RS "Add lines: "add RS "Delete lines: " del;
          revisions=author=add=del=""
         }
{t=$0}
END{ if (t!~/===/) print file RS revisions RS author RS "Add lines: "add RS "Delete lines: " del;}
' infile

RCS file: /cvsroot/eclipse/org.eclipse.debug.core/core/org/eclipse/debug/core/IBreakpointListener.java,v
total revisions: 15
  author: darin
Add lines: 3
Delete lines: -3
RCS file: /cvsroot/eclipse/org.eclipse.debug.core/core/org/eclipse/debug/core/IBreakpointManager.java,v
total revisions: 38
  author: darin
Add lines: 7
Delete lines: -6
RCS file: /cvsroot/eclipse/org.eclipse.debug.core/core/org/eclipse/debug/core/IBreakpointManagerListener.java,v
total revisions: 6
  author: darins
Add lines: 1
Delete lines: -1

# 3  
Old 02-10-2011
Here's a Perl script for the problem. Change the value of $delim to "\t" for tab-delimited output.

Code:
$
$ # display the content of the data file "f7"
$ cat -n f7
     1  RCS file: /cvsroot/eclipse/org.eclipse.debug.core/core/org/eclipse/debug/core/IBreakpointListener.java,v
     2  head: 1.14
     3  branch:
     4  locks: strict
     5  access list:
     6  keyword substitution: o
     7  total revisions: 15;    selected revisions: 2
     8  description:
     9  ----------------------------
    10  revision 1.14
    11  date: 2006-06-12 15:42:24 -0500;  author: darin;  state: Exp;  lines: +2 -2;
    12  copyright updates
    13  ----------------------------
    14  revision 1.13
    15  date: 2006-05-16 09:34:00 -0500;  author: darin;  state: Exp;  lines: +1 -1;
    16  javadoc spelling errors
    17  =============================================================================
    18  RCS file: /cvsroot/eclipse/org.eclipse.debug.core/core/org/eclipse/debug/core/IBreakpointManager.java,v
    19  head: 1.36
    20  branch:
    21  locks: strict
    22  access list:
    23  keyword substitution: o
    24  total revisions: 38;    selected revisions: 4
    25  description:
    26  ----------------------------
    27  revision 1.31
    28  date: 2007-03-26 20:47:29 -0500;  author: darin;  state: Exp;  lines: +1 -1;  commitid: 61604608779a4567;
    29  update copyrights
    30  ----------------------------
    31  revision 1.30
    32  date: 2007-01-17 09:01:45 -0600;  author: inigo;  state: Exp;  lines: +3 -2;  commitid: 614345ae3a564567;
    33  javadoc settings and fixes
    34  ----------------------------
    35  revision 1.29
    36  date: 2006-06-12 15:42:24 -0500;  author: montoya;  state: Exp;  lines: +2 -2;
    37  copyright updates
    38  ----------------------------
    39  revision 1.28
    40  date: 2006-05-16 09:34:00 -0500;  author: darin;  state: Exp;  lines: +1 -1;
    41  javadoc spelling errors
    42  =============================================================================
    43  RCS file: /cvsroot/eclipse/org.eclipse.debug.core/core/org/eclipse/debug/core/IBreakpointManagerListener.java,v
    44  head: 1.6
    45  branch:
    46  locks: strict
    47  access list:
    48  keyword substitution: kv
    49  total revisions: 6;    selected revisions: 1
    50  description:
    51  ----------------------------
    52  revision 1.4
    53  date: 2005-02-23 23:58:22 -0600;  author: darins;  state: Exp;  lines: +1 -1;
    54  CPL --> EPL
    55  =============================================================================
$
$
$ # run the Perl script that processes the file "f7"
$
$ perl -lne 'BEGIN {
               $delim = "|";
               print join $delim, ("File", "Total Revisions", "Authors", "Lines Added", "Lines Deleted")
             }
             if (/^RCS file:.*\/(.*?),.*$/) {
               $file = $1;
             } elsif (/^total revisions: (\d+);.*$/) {
               $revcount = $1;
             } elsif (/^.*author: (\w+);.*lines: \+(\d+) -(\d+).*$/) {
               $authors{$1}++;
               $added += $2;
               $deleted += $3;
             } elsif (/^==+$/) {
               print join $delim, ($file, $revcount, join(",", keys %authors), $added, $deleted);
               $file = "";
               $revcount = "";
               %authors = ();
               $added = 0;
               $deleted = 0;
             }
            ' f7
File|Total Revisions|Authors|Lines Added|Lines Deleted
IBreakpointListener.java|15|darin|3|3
IBreakpointManager.java|38|inigo,darin,montoya|7|6
IBreakpointManagerListener.java|6|darins|1|1
$
$

HTH,
tyler_durden
# 4  
Old 02-11-2011
Thanks rdcwayx and Tyler for the replies.

Although I am not very well versed in Awk and perl, I understand the code that you have given. I will work on these and try to get the other things I want using what you have written.

Thanks again. Really appreciate the help.

Sandeep
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove duplicate text blocks from a file?

Hi All I have a list of files which will have duplicate list of blocks of text. Following is a sample of the file, I have removed the sensitive information from the file. All the code samples starts from <TR BGCOLOR="white"> and Ends with IP address and two html tags like this. 10.14.22.22... (3 Replies)
Discussion started by: mahasona
3 Replies

2. Shell Programming and Scripting

Adding and removing blocks of text from file

Hello all, short story: I'm writing a script to add and remove dns records in dns files. Its on a RHEL 5.5 So far i've locked up the basic operations in a couple of functions: - validate the parameters - search for existant ip in file when adding - search for existant name records in... (6 Replies)
Discussion started by: maverick72
6 Replies

3. Shell Programming and Scripting

Transpose lines from individual blocks to unique lines

Hello to all, happy new year 2013! May somebody could help me, is about a very similar problem to the problem I've posted here where the member rdrtx1 and bipinajith helped me a lot. https://www.unix.com/shell-programming-scripting/211147-map-values-blocks-single-line-2.html It is very... (3 Replies)
Discussion started by: Ophiuchus
3 Replies

4. Shell Programming and Scripting

Concatenate text between patterns in individual strings

In any given file, wherever a certain data block exists I need to concatenate the values(text after each "=" sign) from that block. in that block. The block starts and ends with specific pattern, say BEGIN DS and END DS respectively. The block size may vary. A file will have multiple such blocks.... (12 Replies)
Discussion started by: Prev
12 Replies

5. Shell Programming and Scripting

how to split this file into blocks and then send these blocks as input to the tool called Yices?

Hello, I have a file like this: FILE.TXT: (define argc :: int) (assert ( > argc 1)) (assert ( = argc 1)) <check> # (define c :: float) (assert ( > c 0)) (assert ( = c 0)) <check> # now, i want to separate each block('#' is the delimeter), make them separate files, and then send them as... (5 Replies)
Discussion started by: paramad
5 Replies

6. Shell Programming and Scripting

How to read text in blocks

Hi, I have file which contains information written in blocks (every block is different). Is it possible to read every block one by one to another file (one block per file). The input is something like this <block1> <empty line> <block2> <empty line> ... ... ... <block25> <empty... (0 Replies)
Discussion started by: art84_)LV
0 Replies

7. Shell Programming and Scripting

extract blocks of text from a file

Hi, This is part of a large text file I need to separate out. I'd like some help to build a shell script that will extract the text between sets of dashed lines, write that to a new file using the whole or part of the first text string as the new file name, then move on to the next one and... (7 Replies)
Discussion started by: cajunfries
7 Replies

8. Shell Programming and Scripting

script not working from crontab, executes individual

Hi, This script is working successfully when i executed from shell prompt, but the same script scheduled in crontab its not deleting the files, #! /bin/bash DAY_1=`(date --date='4 months ago' '+%Y-%m')` log=/tmp/cleant adir=/u01/app/oracle/admin/talon/adump... (4 Replies)
Discussion started by: saha
4 Replies
Login or Register to Ask a Question