The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Adding a word in front of a word of each line. Ramesh Vellanki Shell Programming and Scripting 4 07-02-2008 09:17 AM
find a word in a file, and change a word beneath it ?? vikas027 Shell Programming and Scripting 2 02-13-2008 04:23 PM
Can a shell script pull the first word (or nth word) off each line of a text file? tricky Shell Programming and Scripting 5 08-17-2006 06:29 AM
gawk HELP sandeep_hi Shell Programming and Scripting 6 06-19-2006 08:56 AM
rs and ors in gawk ...???? moxxx68 Shell Programming and Scripting 2 10-05-2004 12:52 PM

Reply
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 06-11-2009
Bubnoff Bubnoff is offline
Registered User
  
 

Join Date: Jun 2009
Posts: 13
Thumbs down Word boundaries in GAWK?

I wanted to use GAWK's 'word boundary' feature but can't
get it to work. Doesn't GAWK support \<word\>?

Sample record:


Code:
Title                   Bats in the fifth act of Chushingura (top);
                        the world of the bell - the story of Anchin and Kiyohime (bottom)                               
Series Title            Sketches by Yoshitoshi     
Title-Alternative       Yoshitoshi ryakuga: Komori no godanme (top); Kane no sekai (bottom)
Shouldn't /^\<Title\>/ work to remove "Title-Alternative"? It doesn't. I have to use this:
$1 ~ /^Title$/

Bubnoff

Last edited by Bubnoff; 06-11-2009 at 10:55 PM.. Reason: formatting issue
  #2 (permalink)  
Old 06-11-2009
ghostdog74 ghostdog74 is offline Forum Advisor  
Registered User
  
 

Join Date: Sep 2006
Posts: 2,521
because you are escaping \> and \< whereas in your data, there is no < >
  #3 (permalink)  
Old 06-12-2009
Bubnoff Bubnoff is offline
Registered User
  
 

Join Date: Jun 2009
Posts: 13
Update on GAWK boundaries.

Thanks for answering ghostdog74, however, I'm still a bit unclear on
what you mean. I am aware that I do not have the gt lt characters in the data, I was trying to use GAWK's word boundary operators.

According to the documentation ( GAWK: Effective ...etc. )the regex operators:

\< and \> can be used to indicate word boundaries. They do, but they
use a space as the delimiter ( if I would've RTFMed a bit closer I
would've saved myself this confusion ).

eg. "Title-Alternative" will be true but "TitleAlternative" will be false.

This still makes no sense. How is this working?

I originally thought I could remove "Title-Alternative" by using the word
boundary operators like:

\<Title\>

But since Title-Alternative has a hyphen it's still legal ( why exactly I can't say ). This regex will
remove "titleAlternative" which is closer to the example in the docs, but won't remove "Title-Alternative".

So I think my problem was not fully understanding the way GAWK's W.B.
operators worked ( still don't ).

I am new to AWK and am wondering how others would pull "Title" from
a record that looks similar to what is in my above post.

Code:
 gawk '$1 ~ /^Title$/{print}'
The above works. Any insight into these crazy Word Boundary operators in
GAWK would be much appreciated.

Thanks -

Bub
  #4 (permalink)  
Old 06-12-2009
ghostdog74 ghostdog74 is offline Forum Advisor  
Registered User
  
 

Join Date: Sep 2006
Posts: 2,521
Quote:
Originally Posted by Bubnoff View Post
Thanks for answering ghostdog74, however, I'm still a bit unclear on
what you mean.
my bad. didn't see your requirement properly. if you want to get whole words, there is no need for regular expression. Just go through each word and test for it
Code:
awk '{
 for(i=1;i<=NF;i++){
   if( $i == "Title"){ # or ~ /^Title$/
         ........
   }
 }
}

'
Bits Awarded / Charged to ghostdog74 for this Post
Date User Comment Amount
06-12-2009 Anonymous Handy loop example. 74
  #5 (permalink)  
Old 06-12-2009
Bubnoff Bubnoff is offline
Registered User
  
 

Join Date: Jun 2009
Posts: 13
Forgot to mention the case possibilities -

Each test has to take into account possible capitalization ( or lack thereof ). So actually, I've been using:

Code:
/^[Tt]itle$/
The background to this is that I have around one hundred Dublin Core
records to analyze and the elements I'm testing for are always in
field $1 with the values in fields $2 or $3. "Title" is one of around 15 DC elements I'm testing for plus or minus the screwy ones people insist on adding. Some catalogers capitalize and other do not.

I could use another Gnu Awk feature though:

Code:
 gawk -v IGNORECASE=1 '$1 == "title"{print $1}' test.notes

- as you suggest, instead of with regex -

 gawk '$1 ~ /^[Tt]itle$/{print}' test.notes
Regex spelling is quicker though.

To distinguish title from:
"Title Alternative" or "title alternative", I am using:

Code:
 gawk '$1 ~ /^[Tt]itle$/&&$2 !~ /[Aa]lternative/{print $1}' test.notes
Thanks for the replies -

Bub

Last edited by Bubnoff; 06-12-2009 at 03:16 AM.. Reason: Forgot case.
  #6 (permalink)  
Old 06-12-2009
Ygor's Avatar
Ygor Ygor is offline Forum Staff  
Moderator
  
 

Join Date: Oct 2003
Location: -31.96,115.84
Posts: 1,407
From GNU Regexp Operators - The GNU Awk User's Guide
Quote:
a word is a sequence of one or more letters, digits, or underscores
So hyphens are out.
Bits Awarded / Charged to Ygor for this Post
Date User Comment Amount
06-12-2009 Anonymous helpful 1
  #7 (permalink)  
Old 06-12-2009
Bubnoff Bubnoff is offline
Registered User
  
 

Join Date: Jun 2009
Posts: 13
GAWK boundaries

Thanks Ygor!

I'm embarrassed to say I read this section at least twice, today alone, and didn't catch that. Its times like these when a person should just step
away from the screen, grab a cup o' joe and go for a walk.

Bub
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 06:17 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0