Changing exact matches with awk from subfields


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Changing exact matches with awk from subfields
# 1  
Old 01-24-2012
Changing exact matches with awk from subfields

To give you some context of my issue the following is some sample dummy data. The field delimiter is "<-->". The 4th field is going to be tags for my notes. The tags should always be unique and sorted alphabetically.
Code:
1<-->01/20/12<-->01/20/12<-->1st note<-->1st note<-NL->2 lines
2<-->01/20/12<-->01/20/12<-->2nd note<-->2nd note<-NL->more lines<-NL->here
3<-->01/20/12<-->01/20/12<-->more notes<-->3rd notes, single line
6<-->01/20/12<-->01/20/12<-->what??<-->another notes<-NL->full of <-NL->more<-NL->line

The goal is I want to be able to update a single tag across all my notes. The problem I'm having is I can't find a way to select only whole words. i.e. if I want to change the word "note" to "notez", on the 3rd line, "notes" is changed to "notezs".

How can I change only whole words from my field #4? I've tried awk's sub & index commands, different regular expressions and attempts to combine with sed. The environment I'm working in is ksh88 and I'd prefer to use just awk if possible. Perl sadly isn't an option.
# 2  
Old 01-24-2012
if your awk supports strings as separators:

Code:
awk -v FS="<-->" -v OFS="<-->" '{ sub(/note$/, "notez", $4); } 1'

The $ in this case would mean 'end of the field', not 'end of the line', and the substitution would happen only in the fourth field. You could do a loop for(N=1; N<=NF; N++) sub(/note$/, "notez", $N) to substitute in all fields that end with 'note'.
# 3  
Old 01-24-2012
Would that also work when "note" is not the last word in the field? It could vary well be the first, last or anywhere in between. The problem I have run up against with regular expressions was trying to find a way to specify the delimiters of " " or ">" before the word or " " or "<" after the word without replacing them along with the word change. Using [ >]note[ <] also replaces any spaces or brackets next to "note"
# 4  
Old 01-24-2012
I don't see how to do that in awk... you can match them, but you only want part of the match!

sed maybe:

Code:
sed 's/note\([^a-z]\|$\)/notez\1/g'

It will match note followed by any non-alphabetic character, put a 'z' on the end, and put the non-alphabetic character back where it belongs.
# 5  
Old 01-24-2012
Code:
> cat notes.data | sed 's/note\([^a-z]\|$\)/notez\1/g'
1<-->01/20/12<-->01/20/12<-->1st note<-->1st note<-NL->2 lines
2<-->01/20/12<-->01/20/12<-->2nd note<-->2nd note<-NL->more lines<-NL->here
3<-->01/20/12<-->01/20/12<-->more notes<-->3rd notes, single line
6<-->01/20/12<-->01/20/12<-->what??<-->another notes<-NL->full of <-NL->more<-NL->line

Hmm, not seeing the expected results.

---------- Post updated at 04:12 PM ---------- Previous update was at 01:43 PM ----------

Updated the data and the sed statement:
Code:
1<-->01/20/12<-->01/20/12<-->1st note<-->1st note<-NL->2 lines
2<-->01/20/12<-->01/20/12<-->2nd note<-->2nd note<-NL->more lines<-NL->here
3<-->01/20/12<-->01/20/12<-->more notes<-->3rd notes, single line
6<-->01/20/12<-->01/20/12<-->what??<-->another notes<-NL->full of <-NL->more<-NL->line
7<-->01/20/12<-->01/20/12<-->my_note<-->another notes<-NL->full of <-NL->more<-NL->line
8<-->01/20/12<-->01/20/12<-->mynote<-->another notes<-NL->full of <-NL->more<-NL->line

Code:
> cat notes.data | sed 's/\([^a-z]\)note\([^a-z]\)/\1notez\2/'   
1<-->01/20/12<-->01/20/12<-->1st notez<-->1st note<-NL->2 lines
2<-->01/20/12<-->01/20/12<-->2nd notez<-->2nd note<-NL->more lines<-NL->here
3<-->01/20/12<-->01/20/12<-->more notes<-->3rd notes, single line
6<-->01/20/12<-->01/20/12<-->what??<-->another notes<-NL->full of <-NL->more<-NL->line
7<-->01/20/12<-->01/20/12<-->my_notez<-->another notes<-NL->full of <-NL->more<-NL->line
8<-->01/20/12<-->01/20/12<-->mynote<-->another notes<-NL->full of <-NL->more<-NL->line

As you can see, this works for all but line #7. "my_note" becomes "my_notez". Any thoughts on how to include underscore's in my exclusion? Or for that matter, any character not a space or a "<" or ">".

Thanks!

---------- Post updated at 04:29 PM ---------- Previous update was at 04:12 PM ----------

Code:
> cat notes.data | sed 's/\([ <>]\)note\([ <>]\)/\1notez\2/'  
1<-->01/20/12<-->01/20/12<-->1st notez<-->1st note<-NL->2 lines
2<-->01/20/12<-->01/20/12<-->2nd notez<-->2nd note<-NL->more lines<-NL->here
3<-->01/20/12<-->01/20/12<-->more notes<-->3rd notes, single line
6<-->01/20/12<-->01/20/12<-->what??<-->another notes<-NL->full of <-NL->more<-NL->line
7<-->01/20/12<-->01/20/12<-->my_note<-->another  notes<-NL->full of <-NL->more<-NL->line
8<-->01/20/12<-->01/20/12<-->mynote<-->another notes<-NL->full of <-NL->more<-NL->line

I think I found my answer. The only problem is it edits the 1st occurrence on the line, not just in field #4.

Last edited by adamreiswig; 01-24-2012 at 06:10 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Grep -w not printing exact matches

Dear All, Here is my input TAACGCACTTGCGGCCCCGGGATAAAAAAAAAAAAAAAAAAAAATGGATT NAGAGGGACGGCCGGGGGCATAAAAAAAAAAAAAAAAAAAAAGGGATTTC NGGGTTTTAAGCAGGAGGTGTCAAAAAAAAAAAAAAAAAAAAAGGGATTT NTGGAACCTGGCGCTAGACCAAAAAAAAAAAAAAAAAAAATGGATTTTTG ATACTTACCTGGCAGGGGAGATACCATGATCAATAAAAAAAAAAAAAAAA... (3 Replies)
Discussion started by: jacobs.smith
3 Replies

2. Shell Programming and Scripting

Lookup subfields from 3 tables and insert

Hello masters, Please help on the following. I have a tab delimited file with subfields space delimited. 1 a b x y hhghd ghgf 2 v t f g gdgdgdg hghg I have 3 lookup table files tab delimited, for fields 2,3 and 4 respectively Lookup2 a 10 b 20 v 30 t 40 Lookup3 (12 Replies)
Discussion started by: ritakadm
12 Replies

3. UNIX for Dummies Questions & Answers

Joining and sorting with csvs with subfields

hello masters, I am working with csv files that open just fine in excel, but have sub-fields which are comma separated as well. a 3 column csv looks like a,b,"c,d,e" f,g,h How do I make join or sort believe that "c,d,e" is just 1 field? (8 Replies)
Discussion started by: senhia83
8 Replies

4. Shell Programming and Scripting

awk with range but matches pattern

To match range, the command is: awk '/BEGIN/,/END/' but what I want is the range is printed only if there is additional pattern that matches in the range itself? maybe like this: awk '/BEGIN/,/END/ if only in that range there is /pattern/' Thanks (8 Replies)
Discussion started by: zorrox
8 Replies

5. Shell Programming and Scripting

How to get the exact word in awk?

Hi, i have a file that contains the following: ARTPRD01_app = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 11.222.3.4)(PORT = 1540)) (CONNECT_DATA = (SERVICE_NAME = artprd01.com) ARTPRD01 = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 11.223.3.1)(PORT =... (2 Replies)
Discussion started by: reignangel2003
2 Replies

6. Shell Programming and Scripting

Exact expression matches (can't seem to solve this)

I've seen dozens of similar threads but none seem to match what I'm looking for and I can't seem to make sense of how to do this so any help would be immensely appreciated. I am running a command that generates this output: Mike Smith Mike Smith Alaska Mike Smith Washington Mike Smith Alaska... (6 Replies)
Discussion started by: valgrom
6 Replies

7. Shell Programming and Scripting

AWK: Help inserting records between various matches

Hello, my apologizes if the title is a bit confusing. I am currently working with a series of files that have the form: 2 3 7 17 21 However, I need to insert records such that I have: 0 0 1 0 2 1 3 1 4 0 5 0 6 0 7 1 .... And so on. Currently I have the... (2 Replies)
Discussion started by: Euler2
2 Replies

8. Shell Programming and Scripting

QUESTION1: grep only exact string. QUESTION2: find and replace only exact value with sed

QUESTION1: How do you grep only an exact string. I am using Solaris10 and do not have any GNU products installed. Contents of car.txt CAR1_KEY0 CAR1_KEY1 CAR2_KEY0 CAR2_KEY1 CAR1_KEY10 CURRENT COMMAND LINE: WHERE VARIABLE CAR_NUMBER=1 AND KEY_NUMBER=1 grep... (1 Reply)
Discussion started by: thibodc
1 Replies

9. Shell Programming and Scripting

Using grep returns partial matches, I need to get an exact match or nothing

I’m trying to modify someone perl script to fix a bug. The piece of code checks that the zone name you want to add is unique. However, when the code runs, it finds a partial match using grep, and decides it already exists, so the “create” command exits. $cstatus = `${ZADM} list -vic | grep... (3 Replies)
Discussion started by: TKD
3 Replies

10. Shell Programming and Scripting

awk to count pattern matches

i have an awk statement which i am using to count the number of occurences of the number ,5, in the file: awk '/,5,/ {count++}' TRY.txt | awk 'END { printf(" Total parts: %d",count)}' i know there is a total of 10 matches..what is wrong here? thanks (16 Replies)
Discussion started by: npatwardhan
16 Replies
Login or Register to Ask a Question