[Awk] Extract block of with a particular pattern


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting [Awk] Extract block of with a particular pattern
# 1  
Old 02-14-2011
[Awk] Extract block of with a particular pattern

Hi,

I have some CVS log files, which are divided into blocks. Each block has many fields of information and I want to extract those blocks with a pattern. Here is the sample input.

Code:
RCS file: /cvsroot/eclipse/org.eclipse.debug.core/core/org/eclipse/debug/core/DebugPlugin.java,v
head: 1.174
branch:
locks: strict
access list:
keyword substitution: o
total revisions: 181;    selected revisions: 16
description:
----------------------------
revision 1.149
date: 2007-04-16 11:06:45 -0500;  author: darin;  state: Exp;  lines: +51 -132;  commitid: 611546239f144567;
Bug 178902 Setting Stop in main does not stop when launched
----------------------------
revision 1.148
date: 2007-03-26 20:47:29 -0500;  author: darin;  state: Exp;  lines: +1 -1;  commitid: 61604608779a4567;
update copyrights
----------------------------
revision 1.147
date: 2007-01-18 10:57:34 -0600;  author: darin;  state: Exp;  lines: +5 -0;  commitid: 458f45afa6fd4567;
tracing for debug events
----------------------------
revision 1.146
date: 2007-01-17 09:01:45 -0600;  author: darin;  state: Exp;  lines: +7 -0;  commitid: 614345ae3a564567;
javadoc settings and fixes
=============================================================================
RCS file: /cvsroot/eclipse/org.eclipse.debug.core/core/org/eclipse/debug/core/DebugException.java,v
head: 1.17
branch:
locks: strict
access list:
keyword substitution: o
total revisions: 18;    selected revisions: 2
description:
----------------------------
revision 1.14
date: 2006-06-12 15:42:24 -0500;  author: darin;  state: Exp;  lines: +2 -2;
copyright updates
----------------------------
revision 1.13
date: 2006-05-16 09:34:00 -0500;  author: darin;  state: Exp;  lines: +1 -1;
javadoc spelling errors
=============================================================================

After the word "description", there is information for each revision. I only want those revisions where the last field (which is free text) has the patterns "Bug" , or "Fix" or "####" some number without any preceding letters or words. The last field may be in a single line or in 2 lines.

The above input has the data for 2 files. For each file, I want to retain the information till the word "description", but after that I want the information only for those revisions which have these patterns in them.

The expected output is

Code:
RCS file: /cvsroot/eclipse/org.eclipse.debug.core/core/org/eclipse/debug/core/DebugPlugin.java,v
head: 1.174
branch:
locks: strict
access list:
keyword substitution: o
total revisions: 181;    selected revisions: 16
description:
----------------------------
revision 1.149
date: 2007-04-16 11:06:45 -0500;  author: darin;  state: Exp;  lines: +51 -132;  commitid: 611546239f144567;
Bug 178902 Setting Stop in main does not stop when launched
=============================================================================
RCS file: /cvsroot/eclipse/org.eclipse.debug.core/core/org/eclipse/debug/core/DebugException.java,v
head: 1.17
branch:
locks: strict
access list:
keyword substitution: o
total revisions: 18;    selected revisions: 2
description:
=============================================================================

Sorry for the long question. I would appreciate any help.

Thank you very much.

Sandeep

Last edited by sandeepk1611; 02-14-2011 at 06:00 PM.. Reason: wrong output
# 2  
Old 02-14-2011
Code:
awk '
BEGIN{RS="==*\n";FS="--*\n"}
{for (i=1;i<=NF;i++) {if ($i~/[Bug|Fix|####] [0-9]/||$i~/RCS file:/) print $i OFS}}
' OFS="----------------------------"  infile

This User Gave Thanks to rdcwayx For This Post:
# 3  
Old 02-14-2011
Great solution rdcwayx,

Just a couple of slight tweaks to stop false positives (Bug Fix must start 3rd line) and also support number without proceeding letters (think that is what ##### was supposed to represent):

Code:
awk '
BEGIN{RS="==*\n";FS="--*\n"}
{for (i=1;i<=NF;i++) {if ($i~/[^\n]*\n[^\n]*\n(Bug |Fix |)[0-9]/||$i~/^RCS file:/) print $i OFS}}
' OFS="----------------------------"  infile

This User Gave Thanks to Chubler_XL For This Post:
# 4  
Old 02-14-2011
Code:
awk -v p=0 -v label1="----------------------------" -v label2="=============================================================================" '
$0==label1{p++;y=p}
/RCS file/{x++;p=1}
/===*/{p=""}{a[x" "p]=a[x" "p]"\n##"$0}
END{
for(m=1;m<=x;m++) {for(n=1;n<=y;n++) if(a[m" "n]~/RCS file|##Bug|##Fix|##[0-9]/) print gensub("##","","g",a[m" "n]);print label2}
}' file


Last edited by yinyuemi; 02-14-2011 at 09:21 PM..
# 5  
Old 02-14-2011
A couple of things to keep in mind, in case the solutions don't work for the OP:

1. The use of a regular expression or string in RS is a gawk extension.
2. Since at least one field is free text, it's probably a good idea to anchor the FS regular expression.

Regards,
Alister
# 6  
Old 02-14-2011
Quote:
Originally Posted by alister
A couple of things to keep in mind, in case the solutions don't work for the OP:

1. The use of a regular expression or string in RS is a gawk extension.
2. Since at least one field is free text, it's probably a good idea to anchor the FS regular expression.

Regards,
Alister
Agreed, this should anchor things down, and also keeps the =======* and -----* delimiters from the original file.

Code:
awk '
BEGIN{RS="=============================================================================\n";
 FS="----------------------------";OFS=FS}
{for (i=1;i<=NF;i++) {if($i~/^RCS file:/)printf $1; if($i~/[^\n]*\n[^\n]*\n(Bug |Fix |)[0-9]/) printf OFS $i} printf RS} ' infile

# 7  
Old 02-15-2011
Thanks everyone for the replies. I will try out these solutions on the data I have.

@Chubler_XL,

I had a question. Why did you first assign FS="------*" and then again OFS=FS ? Can you explain that.

Thanks,
Sandeep
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract whole word preceding a specific character pattern with first occurence of the pattern

Hello. Here is a file contents : declare -Ax NEW_FORCE_IGNORE_ARRAY=(="§" ="§" ="§" ="§" ="§" .................. ="§"Here is a pattern =I want to extract 'NEW_FORCE_IGNORE_ARRAY' which is the whole word before the first occurrence of pattern '=' Is there a better solution than mine :... (3 Replies)
Discussion started by: jcdole
3 Replies

2. Shell Programming and Scripting

Find specific pattern and change some of block values using awk

Hi, Could you please help me finding a way to replace a specific value in a text block when matching a key pattern ? I got the keys and the values from a command similar to: echo -e "key01 Nvalue01-1 Nvalue01-2 Nvalue01-3\nkey02 Nvalue02-1 Nvalue02-2 Nvalue02-3 \nkey03 Nvalue03-1... (2 Replies)
Discussion started by: alex2005
2 Replies

3. Shell Programming and Scripting

awk to extract and print first occurrence of pattern in each line

I am trying to use awk to extract and print the first ocurrence of NM_ and NP_ with a : before in each line. The input file is tab-delimeted, but the output does not need to be. The below does execute but prints all the lines in the file not just the patterns. Thank you :). file tab-delimeted ... (2 Replies)
Discussion started by: cmccabe
2 Replies

4. Shell Programming and Scripting

sed and awk usage to grep a pattern 1 and with reference to this grep a pattern 2 and pattern 3

Hi , I have a file where i have modifed certain things compared to original file . The difference of the original file and modified file is as follows. # diff mir_lex.c.modified mir_lex.c.orig 3209c3209 < if(yy_current_buffer -> yy_is_our_buffer == 0) { --- >... (5 Replies)
Discussion started by: breezevinay
5 Replies

5. Shell Programming and Scripting

Need pattern Extract

2014-05-31-18.22.18.500158-240 E11115478A502 LEVEL: Info PID : 25100668 TID : 73282 PROC : db2sysc 0 INSTANCE: udbin001 NODE : 000 DB : APPHDL : 0-18345 APPID: *LOCAL.udbin001.140531200018 AUTHID : udbin001 EDUID : 73282 ... (4 Replies)
Discussion started by: ilugopal
4 Replies

6. Shell Programming and Scripting

Search for a pattern,extract value(s) from next line, extract lines having those extracted value(s)

I have hundreds of files to process. In each file I need to look for a pattern then extract value(s) from next line and then search for value(s) selected from point (2) in the same file at a specific position. HEADER ELECTRON TRANSPORT 18-MAR-98 1A7V TITLE CYTOCHROME... (7 Replies)
Discussion started by: AshwaniSharma09
7 Replies

7. Shell Programming and Scripting

sed: Find start of pattern and extract text to end of line, including the pattern

This is my first post, please be nice. I have tried to google and read different tutorials. The task at hand is: Input file input.txt (example) abc123defhij-E-1234jslo 456ujs-W-abXjklp From this file the task is to grep the -E- and -W- strings that are unique and write a new file... (5 Replies)
Discussion started by: TestTomas
5 Replies

8. Shell Programming and Scripting

pattern extract

Hi I have a pattern like : SYSTEM_NAME-232-S7-200810060949.LOG Here I need to extract system name and the timestamp and also the numeric number after "-S" i.e 7 here . I am not very sure of whether I should use sed / awk for this ?:confused: Thanks, Priya. (6 Replies)
Discussion started by: priyam
6 Replies

9. Shell Programming and Scripting

awk: need to extract a line before a pattern

Hello , I need your help to extract a line in a big file , and this line is always 11 lines before a specific pattern . Do you know a way via Awk ? Thanks in advance npn35 (17 Replies)
Discussion started by: npn35
17 Replies

10. UNIX for Dummies Questions & Answers

Extract the Pattern

Hello All, can anyone help me out in extracting the pattern from a file... The Input file is: NFS B.11.11 ONC/NFS; Network-FileSystem,InformationServices,Utilities|123 NParProvider B.11.11.01.04.01.01 nPartition Provider|456 NPartition A.01.02 Enhanced NPartition Commands/789... (6 Replies)
Discussion started by: aajan
6 Replies
Login or Register to Ask a Question