AWK pattern matching, first and last


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting AWK pattern matching, first and last
# 1  
Old 12-21-2007
AWK pattern matching, first and last

In a nutshell, I need to work out how to return the last matching pattern from an awk //,// search. I can bring back the first, but am unsure how to obtain the last, and a simple tail won't work as the match could be over multiple lines.

Secondly I would like some way of pattern matching, a pattern matched sequence. The program should work by returning the first section matching /*H,H## from a file, and then performing multiple search's for other tags within (#P,P# #C,C#). I did think of returning the /*H pattern into a variable, and searching that, but that converts it to a string without line breaks (I think), and I want to maintain the format of the input.

The code below works in part (albeit not as described above and the last incorrectly handled #M, M#), but there must be a much more efficient way to do it, as it's searching the full file every time.
Any general advice on optimising the code would also be useful, as I appreciate there's a lot of piping going on.

The current code, input and ouput follow:

Code:
#!/bin/ksh 

for file in `find . -type f -name "*.code"` 
do 
  echo ">>>>>>> PROGRAM: " $file 
  echo ================================================================================ 
  awk '/#P/ && ++m==1,/P#/ {print $0}' $file | sed 's/^#[A-Z#] //g' | sed 's/ [A-Z#]#$//g' 
  awk '/#A/ && ++m==1,/A#/ {print $0}' $file | sed 's/^#[A-Z#] //g' | sed 's/ [A-Z#]#$//g' 
  awk '/#C/ && ++m==1,/C#/ {print $0}' $file | sed 's/^#[A-Z#] //g' | sed 's/ [A-Z#]#$//g' 
  awk '/#I/ && ++m==1,/I#/ {print $0}' $file | sed 's/^#[A-Z#] //g' | sed 's/ [A-Z#]#$//g' 
  echo -------------------------------------------------------------------------------- 
  awk '/#D/ && ++m==1,/D#/ {print $0}' $file | sed 's/^#[A-Z#] //g' | sed 's/ [A-Z#]#$//g' 
  awk '/#M/,/M#/ {print $0}' $file | tail -1 | sed 's/^#[A-Z#] //g' | sed 's/ [A-Z#]#$//g' 
  echo 
done

Code:
/*H#############################################################################
##                                                                            ## 
#P Purpose : Creates generation datasets from the original source file        P# 
##                                                                            ## 
#A Author  : D Turpin                                                         A# 
#C Date    : 1st November 2007                                                C# 
##                                                                            ## 
#I Inputs  : data.table1                                                      ## 
##           data.table2                                                      I# 
#O Outputs : data.table3                                                      O# 
##                                                                            ## 
################################################################################ 
## Change History                                                             ## 
#D Who When       Why                                                Version  D# 
##                                                                            ## 
#M DT  01.11.2007 Initial Development                                1.00     M# 
#M DT  15.11.2007 Modified to include new reqs                       1.01     ## 
##                and other things                                            M# 
##                                                                            ## 
#############################################################################H##
*...+....1....+....2....+....3....+....4....+....5....+....6....+....7....+...*/

Code:
>>>>>>> PROGRAM:  ./header.code 
================================================================================ 
Purpose : Creates generation datasets from the original source file 
Author  : D Turpin 
Date    : 1st November 2007 
Inputs  : data.table1 
          data.table2 
-------------------------------------------------------------------------------- 
Who When       Why                                                Version 
               and other things
--------------------------------------------------------------------------------

# 2  
Old 12-21-2007
Code:
#!/bin/ksh 

for file in `find . -type f -name "*.code"` 
do
   echo ">>>>>>> PROGRAM: " $file 
   echo ================================================================================
   awk '/^#P/ || /P#/ {gsub("#P|P#|##",""); print}
        /^#A/ || /A#/ {gsub("#A|A#|##",""); print}
        /^#C/ || /C#/ {gsub("#C|C#|##",""); print}
        /^#I/ || /I#/ {gsub("#I|I#|##",""); print}
        /^#D/ || /D#/ {gsub("#D|D#|##",""); print}
        /^#M/ || /M#/ {gsub("#M|M#|##",""); print}' $file
done

# 3  
Old 12-22-2007
That's much cleaner, and I assume quicker as it's doing a single scan. I can't test this at the moment, but I assume this is addressing the single pass, and the clean, and not the returning of either first or last?
The main problem remaining, is the ability to return the last #M,M# (or nth if possible for future reference), as I want to show just the last modification to a file.

Furthermore, is there any way to contain the searching to within the top section of the file, between the /*H and H## ? as if there are many large files to document, it would make it much quicker.

Thanks for the help so far.
# 4  
Old 12-22-2007
This should give the desired output without trailing spaces:


Code:
#!/bin/ksh

for file in `find . -type f -name "*.code"` 
do
  echo ">>>> Program :" $file
  echo "========================================================================="
  sed -n '/\#P /,/H..$/p' $file |
  sed 's/.*###.*/-------------------------------------------------------------------------/
  s/^#. //g
  s/.#$//g
  s/[ \t]*$//g
  /^$/d'
done


Regards
# 5  
Old 12-22-2007
This solution loses the ability to handle the tags in a custom manner, it simply displays what's already there in a different format, although maybe I should have pointed that requirement out earlier.

Also, I still have the issue of not being able to return only the last #M,M# tag.

In summary, what I want to do is get the /*H, H## section, and individually retrieve tags within this block, choosing all, the first or the last (or nth/-nth if possible) of the tag (e.g. only the modification on the 15th).
# 6  
Old 12-22-2007
Why not simply use CVS or some other version control software?
# 7  
Old 12-22-2007
Code:
awk '
/H##$/{exit}
/#M/,/M#$/ { 
  # assuming Who, When, Why structure 
  # and 3rd field is the "When" column
  if ( $3 ~ /[1-3][1-9]\.[0-1][1-9]\.20[0-1][1-9]/) { 
       lastmod = $3      
  }  
}
/#[PACIDM#]/{ 
    gsub(/#[PACIDM#]|[PACIDM#]#/,"")  
}
/^###*|*\.\.|\/*H|[oO]utputs/{next}
1
END {
  print "Last modified: " lastmod
}
' *code

output:
Code:
# ./test.sh

 Purpose : Creates generation datasets from the original source file

 Author  : D Turpin
 Date    : 1st November 2007

 Inputs  : data.table1
           data.table2


 Who When       Why                                                Version

 DT  01.11.2007 Initial Development                                1.00
 DT  15.11.2007 Modified to include new reqs                       1.01
                and other things

Last modified: 15.11.2007


Last edited by ghostdog74; 12-22-2007 at 09:28 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk pattern matching

I have two files, want to compare file1 data with file2 second column and print line which are not matching. Need help in matching the pattern, file2 second column number can be leading 0 or 00 or 000. Example: file1 1 2 3 file2 a,0001 b,02 c,000 d,01 e,2 f,0005 Expected output:... (20 Replies)
Discussion started by: vegasluxor
20 Replies

2. Shell Programming and Scripting

Pattern matching using awk

Hi I am trying to find a pattern match with column one containing 3 numbers. input file tmp.lst abcd456|1|23123|123123|23423 kumadff|a|dadfadf|adfd|adfadfadf xxxd999|d|adfdfs|adfadf|adfdasfadf admin|a|dafdf|adfadfa||| output file tmp4.lst abcd456|1|23123|123123|23423... (3 Replies)
Discussion started by: vamsekumar
3 Replies

3. Shell Programming and Scripting

awk pattern matching name in records

Hi, I'm very new to these forums. I was wondering if someone could help an AWK beginner with a pattern matching an actor to his appearance in movies, which would be stored as records. Let's say we have a database of 4 movies (each movie a record with name, studio + year, and actor fields with... (2 Replies)
Discussion started by: Jill Ceke
2 Replies

4. Shell Programming and Scripting

awk pattern matching

can somebody provide me with some ksh code that will return true if my the contents in my variable match anyone of these strings ORA|ERROR|SP2 variable="Error:ORA-01017: Invalid username/password; logon denied\nSP2-0640:Not connected" I tried this and it does not seem to work for me ... (3 Replies)
Discussion started by: BeefStu
3 Replies

5. Shell Programming and Scripting

AWK pattern matching on loop

Hi, I am still a beginner on shell scripting so please bear with me. What i am trying to do is filter my logfile based on some ID on field 24 which is defined in array. The filter result output will be moved to my log folder with the same name. The problem is when not using loop, this command... (2 Replies)
Discussion started by: howielim
2 Replies

6. UNIX for Dummies Questions & Answers

awk - pattern matching?

Hello all, I am trying to sort thru a database and print all the customers whose first names are only four characters. I just want to pull the first name only from the database. the database records appear like this in file: Mike Harrington:(510) 548-1278:250:100:175; first is name Mike... (4 Replies)
Discussion started by: citizencro
4 Replies

7. Shell Programming and Scripting

AWK:- matching pattern search

Dear Friends, I have a flat file. To pick certain details we have written an awk where we are facing difficulty. Sample of flat file. line 1 line 2 line 3 line 4 line 5 line 6 line 7 line 8 line 9 line 10 line 11 line 12 line 13 line 14 (Matching pattern "Lkm_i-lnr:"can be... (4 Replies)
Discussion started by: anushree.a
4 Replies

8. Shell Programming and Scripting

Awk -simple pattern matching

Find bumblebee and Megatron patterns (input2) in input1. If it is + read input1 patterns from Left to Right if it is - read input1 patterns from Right to Left Y= any letter (A/B/C/D) input1 c1 100 120 TF01_X1 + AABDDAAABDDBCADBDABC c2 100 120 TF02_X2 - AABDDAAABDDBCBACDBBC... (2 Replies)
Discussion started by: bumblebee_2010
2 Replies

9. Shell Programming and Scripting

AWK pattern matching

Hi, How can I tell awk to print all lines/columns if column number 5 contains the word Monday? I have tried nawk -F, '$5==Monday' OFS=, myfile > outputfile but that doesn't work (I am a newb!!) Thanks, (7 Replies)
Discussion started by: keenboy100
7 Replies

10. Shell Programming and Scripting

pattern matching using awk.

Dear Team, How do we match two patterns on the same line using awk?Are there any logical operators which i could use in awk like awk '\gokul && chennai\' <filename> Eg: Input file: gokul,10/11/1986,coimbatore. gokul,10/11/1986,bangalore. gokul,12/04/2008,chennai.... (2 Replies)
Discussion started by: gokulj
2 Replies
Login or Register to Ask a Question